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Abstract 


Asynchronous  stochastic  systems  arc  abundant  in  the  real  world.  Examples  include  queuing  systems,  tele¬ 
phone  exchanges,  and  computer  networks.  Yet,  little  attention  has  been  given  to  such  systems  in  the  model 
checking  and  planning  literature,  at  least  not  without  making  limiting  and  often  unrealistic  assumptions  re¬ 
garding  the  dynamics  of  the  systems.  The  most  common  assumption  is  that  of  history-independence:  the 
Markov  assumption.  In  this  thesis,  we  consider  the  problems  of  verification  and  planning  for  stochastic  pro¬ 
cesses  with  asynchronous  events,  without  relying  on  the  Markov  assumption.  We  establish  the  foundation 
for  statistical  probabilistic  model  checking,  an  approach  to  probabilistic  model  checking  based  on  hypothe¬ 
sis  testing  and  simulation.  We  demonstrate  that  this  approach  is  competitive  with  state-of-the-art  numerical 
solution  methods  for  probabilistic  model  checking.  While  the  verification  result  can  be  guaranteed  only 
with  some  probability  of  error,  we  can  set  this  error  bound  arbitrarily  low  (at  the  cost  of  efficiency).  Our 
contribution  in  planning  consists  of  a  formalism,  the  generalized  semi-Markov  decision  process  (GSMDP), 
for  planning  with  asynchronous  stochastic  events.  We  consider  both  goal  directed  and  decision  theoretic 
planning.  In  the  former  case,  we  rely  on  statistical  model  checking  to  verify  plans,  and  use  the  simulation 
traces  to  guide  plan  repair.  In  the  latter  case,  we  present  the  use  of  phase-type  distributions  to  approximate  a 
GSMDP  with  a  continuous-time  MDP,  which  can  then  be  solved  using  existing  techniques.  We  demonstrate 
that  the  introduction  of  phases  permits  us  to  take  history  into  account  when  making  action  choices,  and  this 
can  result  in  policies  of  higher  quality  than  we  would  get  if  we  ignored  history  dependence. 
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Chapter  1 


Introduction 


Stochastic  processes  with  asynchronous  events  (and  actions)  arc  abundant  in  the  real  world.  The  canonical 
example  is  a  simple  queuing  system  with  a  single  service  station,  for  example  modeling  your  local  post 
office.  Customers  arrive  at  the  post  office,  wait  in  line  until  the  service  station  is  vacant,  spend  time  being 
serviced  by  the  clerk,  and  finally  leave.  We  can  think  of  the  arrival  and  departure  (due  to  service  completion) 
of  a  customer  as  two  separate  events.  There  is  no  synchronization  between  the  arrival  and  departure  of  cus¬ 
tomers,  i.e.  the  two  events  just  introduced  arc  asynchronous,  so  this  is  clearly  an  example  of  an  asynchronous 
system.  Other  examples  of  asynchronous  systems  include  telephone  exchanges  and  computer  networks. 

When  we  talk  about  stochastic  processes,  we  arc  primarily  concerned  with  random  variations  in  the 
timing  of  events,  for  example  the  duration  of  a  phone  call  (timing  of  a  “hang  up”  event)  or  the  lifetime  of 
an  electronic  component  (timing  of  a  “fail”  event).  We  assume  that  we  arc  given  a  probability  distribution 
accurately  capturing  the  timing  of  events.  We  do  not  concern  ourselves  with  how  these  probability  distribu¬ 
tions  arc  obtained,  although  we  expect  them  to  be  based  on  a  collection  of  empirical  measurements  to  which 
we  fit  an  analytic  distribution  function.  For  example,  the  duration  of  a  phone  call  is  typically  modeled  using 
an  exponential  distribution,  while  component  lifetime  often  is  found  to  match  a  Weibull  distribution. 

1.1  Two  Problems 

In  this  thesis,  we  consider  two  separate  problems  concerning  stochastic  processes  with  asynchronous  events: 
verification  and  planning.  For  verification,  we  arc  given  a  system,  or  a  model  of  a  system,  and  arc  asked 
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to  determine  whether  the  system  satisfies  some  given  property.  The  solution  to  a  verification  problem  is  a 
“yes”  or  “no”  answer.  In  the  case  of  a  telephone  exchange,  for  example,  we  may  want  to  verify  that  the 
probability  is  at  least  0.9999  that  no  calls  are  dropped  in  a  24-hour  period.  For  planning  puiposes,  we  inject 
a  decision  dimension  into  the  model,  and  are  asked  to  find  a  course  of  action  that  will  enable  a  goal  to 
be  attained  or  some  expected  reward  to  be  maximized.  For  instance,  for  a  network  of  computers,  we  can 
introduce  different  service  actions  and  then  tty  to  find  a  service  policy  that  will  give  us  the  best  value. 

Verification  and  planning  can  be  seen  as  vital  steps  in  the  development  of  functional  systems.  Through 
planning,  we  obtain  a  system  design,  and  verification  is  used  to  ensure  that  the  system  design  is  satisfactory. 

1.1.1  Verification 

Probabilistic  verification  of  continuous-time  stochastic  processes  has  received  increasing  attention  in  the 
model  checking  community  in  the  past  five  years,  with  a  clear  focus  on  developing  numerical  solution 
methods  for  model  checking  of  continuous-time  Markov  processes.  Numerical  techniques  tend  to  scale 
poorly  with  an  increase  in  the  size  of  the  model  (the  “state  space  explosion  problem”),  however,  and  are 
feasible  only  for  restricted  classes  of  stochastic  discrete  event  systems. 

We  present  a  statistical  approach  to  probabilistic  model  checking,  employing  hypothesis  testing  and 
discrete  event  simulation.  Our  solution  method  works  for  any  discrete  event  system  that  can  be  simulated, 
and  can  be  used  to  verify  systems  too  large  for  numerical  analysis.  Since  we  rely  on  statistical  hypothesis 
testing,  we  cannot  guarantee  that  the  verification  result  is  correct,  but  we  can  at  least  bound  the  probability 
of  generating  an  incorrect  answer  to  a  verification  problem.  Another  advantage  of  our  model  checking 
algorithm,  as  with  most  statistical  solution  methods,  is  that  it  is  trivially  parallelizable,  so  we  can  solve 
problems  faster  in  a  distributed  fashion  by  utilizing  multiple  interconnected  computers. 

1.1.2  Planning 

Planning  for  stochastic  processes  with  asynchronous  events  and  actions  has  received  little  attention  in  the 
artificial  intelligence  (AI)  literature,  although  some  attention  has  recently  been  given  to  planning  with  con¬ 
current  actions.  Guestrin  et  al.  (2002)  and  Mausam  and  Weld  (2004)  use  discrete-time  Markov  decision 
processes  (MDPs)  to  model  and  solve  planning  problems  with  concurrent  actions,  but  the  approach  is  re¬ 
stricted  to  instantaneous  actions  executed  in  synchrony.  Rohanimanesh  and  Mahadevan  (2001)  consider 
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planning  problems  with  temporally  extended  actions  that  can  be  executed  in  parallel.  By  restricting  the  tem¬ 
porally  extended  actions  to  Markov  options,  the  resulting  planning  problems  can  be  modeled  as  discrete-time 
semi-Markov  decision  processes  (SMDPs). 

All  three  of  the  approaches  cited  above  model  time  as  a  discrete  quantity.  This  is  a  natural  model  of 
time  for  synchronous  systems  driven  by  a  global  clock.  Asynchronous  systems,  on  the  other  hand,  arc  best 
represented  using  a  dense  (continuous)  model  of  time  (Alur  et  al.  1993).  Continuous-time  MDPs  (Howard 
1960)  can  be  used  to  model  asynchronous  systems,  but  arc  restricted  to  events  and  actions  with  exponential 
trigger  time  distributions.  Continuous-time  SMDPs  (Howard  1971b)  lift  the  restriction  on  trigger  time 
distributions,  but  cannot  model  asynchrony. 

We  introduce  the  generalized  semi -Markov  decision  process  (GSMDP),  based  on  the  GSMP  model 
of  discrete  event  systems  (Glynn  1989),  as  a  model  for  asynchronous  stochastic  decision  processes.  A 
GSMDP,  unlike  an  SMDP,  remembers  if  an  event  enabled  in  the  current  state  has  been  continuously  enabled 
in  previous  states  without  triggering.  This  is  key  in  modeling  asynchronous  processes,  which  typically 
involve  events  that  race  to  trigger  first  in  a  state,  but  the  event  that  triggers  first  does  not  necessarily  disable 
the  competing  events.  For  example,  if  a  customer  is  currently  being  serviced  at  the  post  office,  the  fact  that 
another  customer  arrives  does  not  mean  that  the  service  of  the  first  customer  has  to  start  over  from  scratch. 
By  including  a  real-valued  clock  for  each  event  in  the  description  of  states,  we  can  model  a  GSMDP  as  an 
MDP,  but  this  will  be  a  general  state  space,  continuous-time  MDP 

We  present  two  different  solution  methods  for  GSMDPs.  First,  we  consider  the  problem  of  planning 
for  goal  achievement,  and  present  a  planning  framework  based  on  the  Generate,  Test  and  Debug  (GTD) 
paradigm  introduced  by  Simmons  (1988).  This  work  ties  together  our  efforts  in  planning  and  verification. 
The  second  solution  method  is  based  on  a  decision  theoretic  framework,  and  we  present  the  use  of  phase- 
type  distributions  (Neuts  1981)  to  approximate  a  GSMDP  with  a  continuous-time  MDP  that  then  can  be 
solved  exactly  (or  approximately). 


1.2  Summary  of  Research  Contribution 


Stochastic  models  with  asynchronous  events  can  be  rather  complex,  in  particular  if  the  Markov  assumption 
does  not  hold,  such  as  if  event  delays  arc  not  exponentially  distributed  for  continuous-time  models.  Many 
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phenomena  in  nature  arc,  in  fact,  best  modeled  with  non-exponential  distributions,  for  example,  the  lifetime 
of  a  product  (Nelson  1985)  or  a  computer  process  (Leland  and  Ott  1986).  Yet,  the  Markov  assumption  is 
commonly  made,  and  the  attention  in  the  AI  planning  literature,  in  particular,  is  given  almost  exclusively 
to  discrete-time  models,  which  arc  inappropriate  for  asynchronous  systems.  We  believe,  however,  that  the 
complexity  of  asynchronous  systems  is  manageable.  More  precisely,  we  set  out  to  provide  evidence  for  the 
following  statement: 

Thesis.  Verification  and  planning  for  stochastic  processes  with  asynchronous  events  can  be  made  practical 
through  the  use  of  statistical  hypothesis  testing  and  phase-type  distributions. 

We  will  support  this  statement  by  developing  a  set  of  techniques  and  tools  for  verification  and  planning 
with  asynchronous  events.  In  verification,  we  provide  a  unifying  semantics  for  interpreting  probabilistic 
temporal  logic  formulae  over  general  stochastic  discrete  event  systems.  We  have  developed  a  statistical 
approach  to  probabilistic  model  checking,  based  on  hypothesis  testing  and  simulation.  The  main  theo¬ 
retical  results  arc  Theorems  5.4  and  5.8,  which  establish  the  verification  procedure  for  conjunctive  and 
nested  probabilistic  statements.  We  show,  through  empirical  studies,  that  our  approach  compares  well  with 
state-of-the-art  numerical  techniques  for  model  checking  Markov  processes.  We  also  show  that  the  use  of 
memoization  and  heuristics  for  selecting  the  verification  error  of  nested  probabilistic  operators  can  make 
statistical  verification  of  properties  with  nested  probabilistic  statements  work  in  practice.  Finally,  we  con¬ 
sider  the  verification  of  so  called  “black-box”  systems,  which  arc  systems  that  have  already  been  deployed 
and  cannot  be  simulated,  and  make  explicit  the  assumptions  required  for  it  to  produce  reliable  results. 

In  planning,  we  establish  a  framework  for  stochastic  decision  processes  with  asynchronous  events.  We 
consider  both  goal  directed  and  decision  theoretic  ( reward  oriented)  planning.  For  goal  directed  planning, 
we  use  our  statistical  model  checking  algorithm  to  verify  plans.  Plans  that  fail  to  satisfy  a  given  goal 
condition  arc  repaired,  and  we  rely  on  the  execution  traces  generated  during  plan  verification  to  find  reasons 
for  failure.  We  show  that  the  information  obtained  from  the  execution  traces  can  help  us  understand  why  a 
plan  fails,  and  can  also  be  used  to  guide  automated  plan  repair.  For  decision  theoretic  planning,  we  introduce 
the  GSMDP  model,  and  show  how  phase-type  distributions  can  be  used  to  approximate  a  GSMDP  with  a 
continuous-time  MDP  We  show,  through  experiments,  that  the  introduction  of  phases  can  help  us  produce 
better  policies  (in  terms  of  expected  reward)  by  allowing  us  to  take  history  dependence  into  account. 


1.3.  OVERVIEW  OF  THESIS 


5 


We  would  like  to  highlight  two  tools,  in  particular,  that  have  come  out  of  our  research  effort  and  arc  now 
available  to  the  public.  These  arc  Ymer1,  a  tool  for  probabilistic  model  checking,  and  Tempastic-DTP2, 
which  is  our  decision  theoretic  planner  for  GSMDPs. 


1.3  Overview  of  Thesis 

This  thesis  is  divided  into  two  parts,  corresponding  to  the  two  different  research  problems  that  we  address: 
verification  and  planning.  The  two  parts  arc  to  a  large  extent  independent  of  each  other.  We  rely  on  the 
verification  work  when  we  discuss  goal  directed  planning  in  Chapter  8,  but  only  on  an  abstract  level.  The 
separation  into  two  largely  independent  parts  is  made  with  a  heterogeneous  audience  in  mind.  The  target 
audience  for  the  paid  on  verification  is  the  model  checking  community,  while  the  paid  on  planning  primarily 
targets  researchers  in  artificial  intelligence.  To  accommodate  readers  with  a  cross-disciplinary  inclination, 
we  provide  a  comprehensive  introduction  in  Chapter  2  to  terminology,  notation,  and  techniques  that  arc 
used  extensively  throughout  the  remainder  of  the  thesis.  Chapter  3  provides  the  context  for  our  research 
contribution  with  a  discussion  of  related  work  in  probabilistic  verification  and  planning  under  uncertainty. 

Paid  I  consists  of  a  thorough  presentation  and  evaluation  of  our  statistical  approach  to  probabilistic 
model  checking.  We  staid  in  Chapter  4  by  introducing  the  unified  temporal  stochastic  logic  (UTSL)  for 
specifying  properties  of  stochastic  discrete  event  systems.  UTSL  represents  a  unification  of  Hansson  and 
Jonsson’s  (1994)  PCTL,  which  has  a  semantics  defined  for  discrete-time  Markov  processes,  and  Baier  et  al.’s 
(2003)  version  of  CSL,  which  has  a  semantics  defined  for  continuous-time  Markov  processes.  We  provide  a 
semantics  for  UTSL  that  is  defined  in  terms  of  general  stochastic  discrete  event  systems. 

Chapter  5  introduces  a  model  checking  algorithm  for  UTSL,  based  on  statistical  hypothesis  testing. 
This  work  originated  in  an  effort  to  verify  plans  for  complex  stochastic  temporal  domains,  with  a  focus 
on  probabilistic  time -bounded  reachability  properties  (Younes  and  Musliner  2002).  Time-bounded  CSL 
properties  were  later  considered  (Younes  and  Simmons  2002b),  although  with  an  unsatisfactory  solution  for 
conjunctive  and  nested  probabilistic  operators.  These  shortcomings  have  now  been  addressed,  and  a  sound 
and  practical  solution  to  the  verification  of  properties  with  nested  probabilistic  operators  is  presented  for  the 

1  http://www.es.  cmu.edu/~lorens/ymer.html 
2http://www.cs. cmu.edu/~lorens/tempastic-dtp.html 
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first  time  in  this  thesis. 

Chapter  6  provides  an  empirical  evaluation  of  our  model  checking  algorithm  and  a  comparison  with 
numerical  solution  methods.  The  comparative  study  extends  a  previously  published  (Younes  et  al.  2004) 
comparison  of  statistical  and  numerical  solution  methods  for  probabilistic  model  checking.  The  results  arc 
intended  as  an  aid  to  practitioners  when  facing  a  choice  between  different  solution  techniques,  or  when 
selecting  parameters  for  a  specific  solution  method. 

The  model  checking  algorithm  presented  in  Chapter  5  relies  on  the  ability  to  generate  sample  trajectories 
for  a  stochastic  discrete  event  system  on  demand.  In  Chapter  7,  we  consider  a  situation  where  this  is  not 
possible,  for  example,  if  we  want  to  verify  an  already  deployed  system  for  which  we  have  no  model.  We 
assume  that  we  arc  provided  with  a  finite  set  of  sample  trajectories,  and  show  how  to  statistically  verify 
UTSL  properties  based  on  this  limited  source  of  information  about  a  system.  This  chapter,  which  concludes 
the  part  on  verification,  is  based  on  a  previously  published  technical  report  (Younes  2004). 

In  Part  II,  we  consider  the  problem  of  planning  with  asynchronous  events  and  actions.  We  describe  two 
complementary  approaches.  Chapter  8  describes  a  goal  directed  approach.  We  present  a  general  planning 
framework  for  generating  stationary  policies  for  controllable  stochastic  discrete  event  systems  that  satisfy 
UTSL  goal  conditions.  The  statistical  model  checking  algorithm  is  used  for  policy  verification,  and  policies 
that  do  not  satisfy  a  given  goal  condition  are  repaired.  We  rely  on  the  sample  trajectories  generated  during 
the  verification  phase  to  guide  the  repair  effort.  This  chapter  is  based  on  work  reported  two  consecutive 
years  at  ICAPS  (Younes  et  al.  2003;  Younes  and  Simmons  2004a). 

A  decision  theoretic  approach  to  planning  with  asynchronous  events  and  actions  is  presented  in  Chap¬ 
ter  9,  where  we  introduce  the  generalized  semi-Markov  decision  process  (GSMDP).  We  present  the  use  of 
continuous-phase  type  distributions  to  approximate  a  GSMDP  with  a  continuous-time  MDP,  which  can  then 
be  solved  exactly.  We  extend  the  work  of  Younes  and  Simmons  (2004c)  by  considering  additional  tech¬ 
niques  for  approximating  a  general  distribution  with  a  phase-type  distribution.  The  “Bellman  equation”  for 
a  GSMDP  first  appeared  in  a  workshop  paper  (Younes  and  Simmons  2004b). 

Finally,  Chapter  10  discusses  directions  for  future  work  in  verification  and  planning.  For  verification, 
this  includes  statistical  techniques  for  verifying  steady-state  properties  and  the  use  of  symbolic  data  struc¬ 
tures  for  faster  discrete  event  simulation.  In  planning,  we  call  for  a  formal  analysis  of  optimal  GSMDP 
planning  and  discuss  the  possibility  of  using  value  function  approximation  techniques  to  solve  GSMDPs. 


Chapter  2 


Background 


This  chapter  introduces  terminology  and  techniques  that  will  be  used  extensively  in  later  chapters.  Readers 
already  familial-  with  concepts  such  as  random  variable,  probability  distribution,  acceptance  sampling,  and 
stochastic  process  may  still  find  it  useful  to  read  this  chapter,  as  our  notation  may  differ  from  what  they 
are  used  to.  In  particular,  this  is  the  case  for  standard  parametric  probability  distributions,  and  we  refer  the 
reader  to  Table  2.1  for  a  summary  of  our  notation  for  important  distributions. 

2.1  Random  Variables  and  Probability  Distributions 

Consider  the  chance  experiment  of  observing  the  outcome  of  a  die  roll.  The  possible  observations  are  the 
integers  1  through  6.  For  a  regular  die,  we  assume  that  each  outcome  is  equally  likely,  i.e.  outcome  i  is 
observed  with  probability  1/6.  Now,  consider  a  chance  experiment  that  consists  of  observing  the  duration 
of  a  phone  call.  The  outcome  of  this  experiment  is  a  positive  real  number,  rather  than  an  integer,  and  there 
is  some  probability  of  observing  a  call  with  a  duration  no  longer  than  t. 

Formally,  we  represent  a  chance  experiment  with  a  random  variable  (Feller  1957;  Wadsworth  and  Bryan 
1960),  also  called  a  variate.  A  random  variable  X  can  take  on  any  value  in  an  outcome  space  12,  and  we 
associate  a  non-negative  weight  f(x )  with  each  possible  outcome  x  £  fl.  The  outcome  space,  as  illustrated 
by  the  two  examples  in  the  previous  paragraph,  can  be  discrete  or  continuous.  We  assume,  for  simplicity, 
that  the  outcome  space  is  either  the  integers  or  the  real  numbers.  In  the  former  case,  we  call  X  a  discrete 
random  variable,  while  in  the  latter  case  X  is  referred  to  as  a  continuous  random  variable.  Impossible 
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Figure  2.1:  Probability  density  function  for  a  discrete 
random  variable. 


Figure  2.3:  Cumulative  distribution  function  for  a  dis¬ 
crete  random  variable. 


Figure  2.2:  Probability  density  function  for  a  continuous 
random  variable. 


Figure  2.4:  Cumulative  distribution  function  for  a  con¬ 
tinuous  random  variable. 


outcomes,  for  example  7  in  the  die  roll  experiment,  are  assigned  zero  weight. 

The  total  weight  for  the  outcome  space  must  equal  unity.  In  other  words,  the  weight  function  /  must 
satisfy  the  condition  f(}  fix)  =  1.  For  discrete  outcome  spaces,  fix)  is  simply  the  probability  associated 
with  outcome  x.  In  the  continuous  case,  fix)  is  not  a  probability,  however,  and  fix)  can  be  greater  than 
1.  For  example,  f(x)  is  either  0  or  2  for  a  continuous  uniform  distribution  over  the  interval  (0,0.5).  The 
function  f(x)  is  called  the  probability  density  function  for  the  random  variable  X.  Figures  2.1  and  2.2  show 
the  probability  density  function  for  a  discrete  and  a  continuous  random  variable,  respectively.  The  support 
of  a  probability  distribution  is  the  subset  of  the  outcome  space  fl  with  positive  weight.  It  is  {1,  2, 3, 4,  5,  6} 
for  the  distribution  in  Figure  2.1  and  [0,  oo)  for  the  distribution  in  Figure  2.2. 

The  probability  that  the  value  of  X  is  at  most  t,  Pr[2f  <  t],  is  a  function  Fit)  called  the  cumulative  dis¬ 
tribution  function.  We  have  Fit)  =  ffx=-oo  f(x)  f°r  discrete  random  variables  and  Fit)  =  f_  fix)  dx 
for  continuous  random  variables.  Since  f(x)  is  non-negative  for  all  values  of  x,  Fit)  is  a  non-decreasing 
function  of  t,  limt__00  Fit)  =  0,  and  Fit)  =  1.  A  probability  distribution  is  positive  if  F(0)  =  0. 

Figures  2.3  and  2.4  show  two  examples  of  cumulative  distribution  functions  for  positive  distributions. 
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We  can  obtain  new  random  variables  as  functions  of  existing  random  variables.  In  the  board  game 
Monopoly,  for  example,  a  player  rolls  two  dice  at  once  and  adds  the  outcome  of  the  two  rolls  to  determine 
the  number  of  steps  to  take  on  the  board.  Let  X\  and  X2  be  random  variables  representing  the  individual 
die  rolls.  The  sum  X\  +  X2  is  another  random  variable  Y  representing  the  chance  experiment  of  simulta¬ 
neously  rolling  two  identical  dice  and  summing  up  their  outcomes.  In  general,  a  function  g(X i, . . . ,  Xn ) 
of  n  random  variables  is  itself  a  random  variable  Y  with  some  probability  density  function  and  cumulative 
distribution  function. 

2.1.1  Expectation,  Variance,  and  Moments 

The  probability  density  function  or  cumulative  distribution  function  for  a  random  variable  X  fully  charac¬ 
terizes  the  chance  experiment  represented  by  X.  It  is  common  to  present  a  set  of  summarizing  statistics  for 
the  experiment  instead  of  the  whole  distribution  function.  The  most  commonly  used  summarizing  statistic 
is  the  mean,  or  expected  value,  of  a  random  variable.  The  expected  value  of  X,  denoted  E  x] ,  is  defined 
as  p  =  !C^_oo  xf(x)  f°r  discrete  distributions  and  p  =  ffx  xf(x)  dx  for  continuous  distributions.  The 
value  p  represents  the  “expected  outcome”  of  a  chance  experiment,  but  does  not  necessarily  correspond  to 
a  possible  outcome.  In  the  case  of  a  single  die  throw,  for  example,  we  have  p  =  3.5. 

While  the  mean  is  a  measure  of  location  for  a  random  variable,  the  variance  of  X,  denoted  Var[V]  or 
a2,  is  a  measure  of  spread.  It  is  defined  as  a2  =  E[(X  —  p)2],  where  p  is  the  mean  of  X.  The  square  root 
of  the  variance,  a,  is  called  the  standard  deviation,  and  is  sometimes  preferred  as  a  measure  of  spread  in 
practice  because  a  and  p  are  of  the  same  unit  of  measurement.  For  example,  if  p  is  the  average  length  of  a 
phone  call  in  seconds,  then  a  measures  the  spread  in  seconds,  while  cr2  gives  a  measure  of  spread  in  squared 
seconds.  The  spread  can  also  be  specified  using  the  coefficient  of  variation,  defined  as  cv  =  cr/p,  or  the 
squared  coefficient  of  variation  (cv2),  which  gives  a  measure  of  spread  that  is  relative  to  the  location  p. 

The  mean  of  a  random  variable  is  a  special  case  of  a  set  of  summarizing  statistics  called  moments.  The 
ith  moment  of  a  random  variable  X  is  defined  as  pi  =  E\X1].  Obviously,  the  mean  of  X  is  p\.  The  variance, 
a2,  can  be  expressed  using  the  first  two  moments: 

a2  =  E[(X  -  pf)2]  =  E[X2}  -  2 p1  E[X]  +p2  =  p.2-p2 
The  squared  coefficient  of  variation,  cv2,  is  therefore  equal  to  (p2 / p\ )  —  1. 
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Distribution 

F(x) 

P 

o2 

(  0  if  x  <  0 

Bernoulli 

<  1  —  p  if  a:  =  0 

P 

P{  1  - 

P) 

y  1  if  x  >  0 

Geometric,  G(p) 

1-(1  -p)x  (x  >  0) 

1 

1  -p(l 

o 

~P) 

X  /-n\ 

P 

P2 

Binomial,  B{n,p) 

np 

np{  1  - 

-P) 

Uniform,  U (a,  b ) 

(  0  if  x  <  a 

<  (x  —  a)/(b  —  a)  if  a  <  x  <  b 

y  1  if  x  >  b 

a  +  b 

2 

(b  —  c 
12 

0! 

Exponential,  Exp(X) 

1  -  e~Xx  (x  >  0) 

1 

A 

1 

Weibull,  W(r),(3) 

1  _  e-G/vf  (a;  >  o) 

pryi  +  p-1) 

rj2  (r(l  +  2/3-1)  - 

-^(l  +  r1)) 

Lognormal,  L(/i,  s) 

ct>(s_1  log(x/p)  —  s/2)  (x  >  0) 

p 

p2(e*2- 

-1) 

Table  2.1:  Common  parametric  probability  distributions. 


2.1.2  Parametric  Distributions 

A  probability  distribution  can  be  almost  arbitrarily  complex,  but  many  important  phenomena  in  nature  can 
be  fairly  accurately  described  using  only  a  few  parameters.  We  call  a  distribution  parametric  if  the  shape 
of  its  distribution  function  is  determined  by  the  values  of  a  finite  number  of  parameters.  Table  2. 1  shows 
the  cumulative  distribution  function,  mean,  and  variance  for  seven  parametric  distributions  that  will  occur 
frequently  in  this  thesis.  Next,  we  describe  each  of  these  distributions  in  more  detail. 

Let  the  random  variable  X  represent  the  chance  experiment  of  tossing  an  unbiased  coin.  The  probability 
distribution  associated  with  X  can  be  specified  using  the  single  parameter  p  =  1/2,  and  is  an  example  of 
a  Bernoulli  distribution.  The  random  variable  X  is  called  a  Bernoulli  variate  and  the  chance  experiment 
represented  by  A  is  a  Bernoulli  trial.  In  general,  the  Bernoulli  distribution  can  be  used  to  model  any  chance 
experiment  with  two  distinct  outcomes,  typically  encoded  by  the  integers  0  and  1,  and  with  a  probability  p 
of  outcome  1  occurring. 

Next,  consider  an  experiment  where  we  toss  a  coin  repeatedly  until  we  get  a  head.  Let  A  be  a  random 
variable  with  value  equal  to  the  number  of  coin  tosses  in  an  experiment.  In  this  case,  X  is  said  to  have  a 
geometric  distribution  with  parameter  p  =  1/2.  The  probability  of  observing  that  X  has  value  x  (i.e.  that 


2. 1 .  RANDOM  VARIABLES  AND  PROBABILITY DISTRIB UTIONS 


11 


x  coin  tosses  arc  required  to  get  one  head  in  a  specific  experiment)  is  pi  I  —  p)x,  which  is  the  probability 
density  function  for  the  geometric  distribution.  The  probability  of  observing  x  tails  in  a  row  is  (1  —  p)x,  so 
the  cumulative  distribution  function  is  F(x)  =  1  —  (1  —  p)x . 

Let  Xi, ,  Xn  be  n  independent  and  identically  distributed  Bernoulli  variates  with  parameter  p.  The 
random  variable  Y  =  YH=i  then  has  a  binomial  distribution  with  parameters  n  and  p,  denoted  B(n,  p).  If 
we  carry  out  n  independent  coin  tosses,  for  example,  then  the  number  of  heads  that  we  observe  is  binomially 
distributed  with  p  =  1/2.  The  binomial  distribution  will  play  a  central  role  in  the  next  section,  when  we 
discuss  acceptance  sampling,  which  is  the  technique  we  will  later  use  for  statistical  probabilistic  model 
checking. 

A  random  variable  representing  a  die  roll  has  a  discrete  uniform  distribution  with  fix)  =  1/6  for 
x  G  {1, ...  ,6}  (Figures  2.1  and  2.3  plot  fix)  and  F(x),  respectively,  for  this  distribution).  The  uniform 
distribution  can  also  be  defined  over  a  continuous  interval  (a,  6),  with  f(x)  =  (x  —  a) /if)  —  a)  for  x  G 
(a,  b).  The  uniform  distribution  has  finite  support,  unlike  the  geometric  distribution  and  the  three  continuous 
distributions  mentioned  below  which  all  have  infinite  support. 

The  exponential  distribution,  with  cumulative  distribution  function  F{x)  =  1  —  e~Xx,  is  one  of  the  most 
widely  used  continuous  distributions  due  to  its  favorable  analytical  properties.  The  parameter  A  is  the  rate 
of  the  distribution,  for  example  representing  the  failure  rate  of  an  electrical  component  or  the  arrival  rate 
of  customers  at  a  post  office.  Figures  2.2  and  2.4  plot  f(x)  and  Fix),  respectively,  for  the  exponential 
distribution  with  A  =  1.  The  exponential  distribution  is  memoryless.  This  means  that  if  X  is  a  random 
variable  with  an  exponential  distribution,  then  Pr/A  >f  +  s|2f>f]  =  Pr[X  >  s].  The  geometric 
distribution,  which  in  many  ways  can  be  seen  as  a  discrete  version  of  the  exponential  distribution,  is  also 
memoryless,  and  these  arc  in  fact  the  only  memoryless  distributions  (Feller  1957,  p.  305).  The  memoryless 
property  is  essential  for  analytical  tractability  in  many  applications. 

Not  all  phenomena  in  the  real  world  can  be  properly  captured  by  a  memoryless  distribution.  Component 
lifetime,  for  example,  is  often  not  memoryless.  Failure  may  be  more  likely  early  on  during  a  warm-up  period 
than  when  a  system  has  been  running  for  a  while,  or  it  could  be  the  case  that  the  failure  rate  increases  with 
time  due  to  material  fatigue.  The  Weibull  distribution  (Weibull  1951),  with  cumulative  distribution  function 
F{x)  =  1  —  fV<xiTI,l'\  is  commonly  used  in  reliability  engineering  for  this  purpose.  The  parameter  r/  is  a 
scale  parameter,  while  T  is  a  shape  parameter  with  0  <  f3  <  1  giving  a  decreasing  failure  rate  and  3  >  I 
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Figure  2.5:  Probability  density  function  (left)  and  cumulative  distribution  function  (right)  for  the  Weibull  distribution. 

giving  an  increasing  failure  rate.  The  mean  and  variance  of  a  Weibull  distribution  are  defined  in  terms  of 
the  gamma  function,  T(x')  =  tx~1e~t  dt,  as  shown  in  Table  2.1.  If  [3  is  equal  to  1,  then  the  Weibull 
distribution  is  simply  an  exponential  distribution  with  rate  1/?/.  Figure  2.5  shows  the  probability  density 
function  and  cumulative  distribution  function  for  three  different  values  of  3. 

The  lognormal  distribution  is  another  probability  distribution  commonly  used  in  reliability  engineering. 
If  X  is  a  random  variable  with  a  lognormal  distribution,  then  Y  =  log  X  is  a  normal  variate.  The  cumulative 
distribution  function  for  the  standard  normal  distribution  (//  =  1  and  a  =  0)  is  given  by  the  formula 

(2.1)  <f>(x)  =  ^L  f  e-*2/2  dt  , 

v  27 r  J — oo 

and  Table  2.1  shows  the  distribution  function  for  the  lognormal  distribution  in  terms  of  <b(x). 

2.1.3  Phase-Type  Distributions  and  Approximation  Techniques 

The  exponential  distribution,  with  its  memoryless  property,  is  often  used  in  models  of  stochastic  systems. 
This  results  in  models  for  which  tractable  solution  techniques  for  many  problems  (e.g.  model  checking 
and  planning)  exist.  Phase-type  distributions  (Neuts  1975,  1981),  both  discrete  and  continuous,  generalize 
the  exponential  distribution  to  permit  memory  dependence  in  the  form  of  phases.  We  will  use  phase-type 
distributions  in  Chapter  9  to  approximate  non-exponential  parametric  distributions  for  the  purpose  of  solving 
decision  theoretic  planning  problems  with  asynchronous  events. 

Erlang  (1917)  was  the  first  to  consider  a  generalization  of  the  exponential  distribution  that  preserves 
much  of  its  analytic  tractability.  Let  X], ,  Xn  be  n  random  variables,  all  having  an  exponential  distri¬ 
bution  with  rate  A.  The  random  variable  Y  =  Y17=i  is  then  said  to  have  an  Erlang  distribution  with 
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Figure  2.6:  Erlang  distribution. 


Figure  2.7:  Coxian  distribution. 


parameters  n  and  A.  The  Erlang  distribution  can  be  thought  of  as  a  chain  of  n  phases  where  the  time  spent 
in  each  phase  before  transitioning  to  the  next  phase  is  exponentially  distributed  with  rate  A  (Figure  2.6).  The 
random  variable  Y  represents  the  time  from  entry  of  the  first  phase  until  exit  of  the  last  phase.  A  general¬ 
ized  Erlang  distribution  includes  the  possibility  of  exiting  the  chain  already  after  the  first  phase  (there  is  a 
probability  p  of  transitioning  to  the  second  phase). 

A  Coxian  distribution  (Cox  1955)  is  a  further  generalization  of  the  Erlang  distribution,  permitting  phase- 
dependent  transition  rates  and  a  probability  q, ;  =  (1  —  pi)  of  bypassing  the  remaining  phases  after  exiting 
phase  i.  Figure  2.7  shows  an  //-phase  Coxian  distribution.  Note  that  a  Coxian  distribution  with  n  phases  has 
2n  —  1  parameters,  while  an  n-phase  Erlang  distribution  only  has  a  single  parameter  (the  rate  A). 

The  Erlang  and  Coxian  distributions  arc  special  cases  of  the  class  of  phase-type  distributions.  In  general, 
a  phase-type  distribution  with  n  phases  represents  the  time  from  entry  until  absorption  in  a  Markov  process 
(see  Section  2.3.3)  with  n  transient  states  and  a  single  absorbing  state.  We  arc  primarily  interested  in 
continuous  phase-type  distributions,  as  this  thesis  is  concerned  with  asynchronous  systems,  which  arc  best 
represented  using  a  continuous  model  of  time.  The  general  form  of  an  n-phase  continuous  phase-type 
distribution  is  specified  using  n2  +  2 n  parameters: 

•  A,;,  for  1  <  i  <  n,  representing  the  exit  rate  for  phase  i. 

•  pij,  for  1  <  i.  j  <  n,  representing  the  probability  that  phase  i  is  followed  by  phase  j.  The  probability 
qt  =  1  —  J2J=i  Pij  is  ^ie  probability  of  absorption  immediately  following  phase  i. 

•  q:,;,  for  1  <  i  <  n,  representing  the  probability  that  the  initial  phase  is  i. 

If  we  define  an  n  x  n  matrix  Q,  with  elements  Qu  =  —  A,  (1  —  pa)  and  Qij  =  \pij  (i  /  j),  and  a  row 
vector  a  =  [a:,.] .  then  the  cumulative  distribution  function  for  a  continuous  phase-type  distribution  is  given 
by  F(x)  =  1  —  a.(Pxe,  where  e  is  a  unit  column  vector  of  size  n.  The  kth  moment  of  the  distribution  is 
Pk  =  k\a(— Q)~ke.  It  is  common  to  consider  only  acyclic  phase-type  distributions,  where  phase  j  can  be 
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reached  only  from  phases  i  <  j,  because  they  require  fewer  parameters. 

We  can  use  a  phase-type  distribution  PH  to  approximate  a  general  distribution  G,  for  example  a  Weibull 
or  lognormal  distribution.  The  most  straightforward  approximation  technique  is  the  method  of  moments, 
where  the  objective  is  to  match  the  first  k  moments  of  G  and  PH.  When  using  the  method  of  moments,  it  is 
desirable  to  match  as  many  moments  of  G  as  possible,  but  we  will  typically  need  more  phases  to  match  more 
moments,  so  there  is  a  tradeoff  between  accuracy  and  complexity  of  the  approximate  model.  The  objective 
is  often  to  find  a  phase-type  distribution  that  matches  a  fixed  number  of  moments  and  is  minimal  (in  terms 
of  the  number  of  phases),  or  close  to  minimal,  within  a  certain  class  of  phase-type  distributions  (e.g.  acyclic 
phase-type  distributions). 

We  can  easily  match  a  single  moment  of  a  general  distribution  G  by  using  an  exponential  distribution 
with  rate  1  / // 1 ,  but  this  typically  yields  a  poor  approximation  of  G.  It  is  possible  to  match  the  first  two 
moments  of  any  positive  distribution  using  either  a  generalized  Erlang  distribution  or  a  two-phase  Coxian 
distribution.  If  the  squared  coefficient  of  variation,  cv2,  is  less  than  1,  then  we  can  use  a  generalized  Erlang 
distribution  with  the  following  parameters  (Sauer  and  Chandy  1975;  Marie  1980): 


n  = 


1 


err 


A  = 


1  —  p  +  np 
Pi 


p  =  1  — 


2 n  ■  cv 2  +  n  —  2  —  \J  n2  +  4  —  4n  •  cv 2 


2  (n  —  1)  (cv2  +  1) 

For  example,  a  uniform  distribution  (7(0, 1)  (fi  \  =1/2  and  cv2  =  1/3)  can  be  approximated  by  a  three- 
phase  (generalized)  Erlang  distribution  with  p  =  1  and  A  =  6.  For  distributions  with  cv2  >  1/2,  we  match 
the  first  two  moments  with  a  two-phase  Coxian  distribution  with  parameters  A i  =  2 /// 1 ,  A  9  =  1  / (ji\  ■  cv2), 
and  p  =  1/(2  •  cv2)  (Marie  1980).  For  example,  a  Weibull  distribution  W{  1, 1/2)  has  pi  =  2  and  cv 2  =  5, 
and  can  therefore  be  approximated  by  a  two-phase  Coxian  distribution  with  Ai  =  1,  A2  =  1/10,  and 
p  =  1/10.  Whitt  (1982)  and  Altiok  (1985)  show  how  to  find  a  phase-type  distribution  with  only  two  phases 
that  matches  the  first  three  moments  of  a  general  distribution,  provided  that  cv 2  >  1  and  n;$  >  3p2/{2pi). 
Telek  and  Heindl  (2002)  provide  bounds  on  //;»,  with  cv2  >  1/2,  for  which  a  two-phase  Coxian  distribution 
can  be  used  to  match  three  moments.  Johnson  and  Taaffe  (1989)  use  a  mixture  of  Erlang  distributions  to 
match  the  first  three  moments  of  any  positive  distribution,  but  the  resulting  phase-type  distribution  is  a  factor 
two  from  minimal  in  the  class  of  acyclic  phase-type  distributions.  Johnson  and  Taaffe  (1990)  describe  an 
approach  for  matching  three  moments  based  on  nonlinear  programming,  which  results  in  close  to  minimal 
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acyclic  phase-type  distributions.  An  analytic  solution,  combining  an  Erlang  distribution  with  a  two-phase 
Coxian  distribution,  for  matching  three  moments  with  close  to  minimal  acyclic  phase-type  distributions  is 
presented  by  Osogami  and  Harchol-B alter  (2003). 

It  is  possible  to  match  the  first  few  moments  of  a  distribution  without  obtaining  a  good  fit  for  the  dis¬ 
tribution  function.  For  example,  the  first  two  moments  do  not  reveal  whether  the  distribution  function  has 
multiple  modes.  Instead  of  matching  moments  of  a  distribution,  we  can  try  to  match  the  shape  of  the  distribu¬ 
tion  function.  The  Kullback-Leibler  divergence  (KL-divergence),  or  relative  entropy,  is  a  popular  similarity 
measure  for  distribution  functions.  Let  /  and  g  be  two  probability  density  functions.  The  KL-divergence  of 
/  and  g  is  defined  as  follows  (Kullback  and  Leibler  1951,  p.  80): 1 

(2.2)  KL(f,g)=[  f{x)\og^f\  dx 

J-oo  9{x) 

Asmussen  et  al.  (1996)  use  the  EM  (Expectation-Maximization)  algorithm  (Dempster  et  al.  1977)  to  fit 
a  general  phase-type  distribution  to  an  arbitrary  continuous  distribution,  minimizing  the  KL-divergence. 
Bobbio  and  Cumani  (1992)  present  a  maximum  likelihood  estimation  algorithm  for  fitting  an  acyclic  phase- 
type  distribution  to  a  continuous  distribution.  For  both  fitting  algorithms,  the  user  selects  the  number  of 
phases  to  use  instead  of  the  number  of  moments  to  match,  with  more  phases  typically  resulting  in  a  better  fit. 
These  approaches  are  computationally  more  costly  than  the  method  of  moments.  The  number  of  iterations 
required  for  the  EM  algorithm  to  converge  tends  to  grow  with  the  number  of  phases.  Convergence  can 
be  reached  faster  by  imposing  restrictions  on  the  structure  of  the  phase-type  distribution,  for  example  by 
matching  a  sum  of  n  exponential  distributions  or  an  n-phase  Coxian  distribution  rather  than  a  general  phase- 
type  distribution.  Figure  2.8  shows  the  probability  density  function  for  the  uniform  distribution  [7(0, 1)  and 
five  different  phase-type  distributions  (two  obtained  by  matching  moments,  and  three  obtained  through  use 
of  the  EM  algorithm).  We  need  only  a  single  phase  to  match  the  first  moment  of  [7(0, 1),  and  we  need  three 
phases  to  match  the  first  two  moments  (achieved  by  an  Erlang  distribution,  as  mentioned  earlier). 

A  continuous  distribution  can  also  be  approximated  by  a  discrete  phase-type  distribution  (Bobbio  et  al. 
2003,  2004).  An  advantage  of  using  discrete,  rather  than  continuous,  phase-type  distributions  is  that  a  lower 
coefficient  of  variation  can  be  achieved  with  the  same  number  of  phases.  It  is  known  that  with  n  phases, 

1  The  KL-divergence  can  be  thought  of  as  the  distance  between  two  probability  density  functions,  although  technically  it  is  not 
a  true  distance  measure  because  it  is  not  symmetric. 
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Figure  2.8:  Phase-type  fitting  for  uniform  distribution.  The  KL-divergence  for  each  phase-type  distribution  is  shown 
in  parentheses. 


cv 2  is  at  least  1  fn  for  a  continuous  phase-type  distribution,  with  1  jn  achieved  exactly  by  an  //-phase  Erlang 
distribution  (Aldous  and  Shepp  1987).  Discrete  phase-type  distributions  can  also  capture  distributions  with 
finite  support  and  deterministic  distributions,  while  continuous  phase-type  distributions  always  have  infinite 
support.  One  clear  disadvantage,  however,  with  discrete-time  approximations  of  continuous-time  systems 
is  that  coincident  events  must  be  taken  into  consideration.  With  continuous  distributions,  the  probability 
of  two  events  occurring  at  the  same  time  is  zero,  but  if  we  discretize  time,  two  events  may  occur  in  the 
same  interval  of  time.  This  can  significantly  increase  the  complexity  of  any  analysis  of  the  model,  and  is 
particularly  a  problem  for  analyses  of  systems  with  asynchronous  events. 


2.2  Acceptance  Sampling  with  Bernoulli  Trials 

A  probabilistic  model  checking  problem  can  be  phrased  as  a  hypothesis  testing  problem.  We  will  take 
advantage  of  this  in  Chapter  5  when  presenting  a  statistical  approach  to  probabilistic  model  checking.  As 
an  example  of  a  hypothesis  testing  problem,  consider  a  manufacturing  process  that  produces  units  of  some 
product.  Each  manufactured  unit  is  either  functional  or  defective,  and  assume  that  there  is  some  probability 
p,  unknown  to  us,  of  the  process  producing  a  functional  unit.  Naturally,  we  want  p  to  be  high,  meaning  that 
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the  expected  fraction  of  functional  units,  in  a  lot  of  produced  units,  is  high.  Let  6  be  the  lowest  acceptable 
value  of  p.  By  inspecting  a  limited  number  of  manufactured  units,  we  want  to  determine  if  the  manufacturing 
process  is  acceptable  (i.e.  p  >  9).  This  section  discusses  how  to  solve  problems  like  this  statistically  using 
a  technique  called  acceptance  sampling,  which  we  will  later  use  for  probabilistic  model  checking. 

2.2.1  Problem  Formulation 

Let  X,  be  a  random  variable  having  a  Bernoulli  distribution  with  parameter  p,  i.e.  Pr[X,  =  1]  =  p  and 
Pr[X,;  =  0]  =  1  —  p.  An  observation  xr  of  X,  has  value  either  0  or  1.  For  the  manufacturing  process 
mentioned  above,  xr  is  1  if  the  ith  unit  that  we  observe  is  functional,  and  0  if  it  is  defective.  Each  random 
variable  Xt,  called  a  Bernoulli  trial,  represents  the  inspection  of  a  manufactured  unit  and  the  observation 
Xi  represents  the  outcome  of  the  inspection.  We  are  interested  in  testing  whether  the  parameter  p  of  the 
Bernoulli  distribution  is  above  or  below  some  given  threshold  9.  More  specifically,  we  want  to  test  the 
hypothesis  H  :  p  >  9  against  the  alternative  hypothesis  K  :  p  <  9. 

We  arc  going  to  consider  statistical  approaches  for  solving  this  hypothesis  testing  problem,  and  we  gen¬ 
erally  have  to  tolerate  that  any  statistical  test  procedure  has  some  probability  of  accepting  a  false  hypothesis, 
but  this  is  tolerable  so  long  as  the  probability  of  error  is  sufficiently  low.  In  particular,  the  test  procedure 
should  limit  the  probability  of  accepting  the  hypothesis  K  when  H  holds  (known  as  a  type  I  error,  or  false 
negative)  to  a,  and  the  probability  of  accepting  H  when  K  holds  (a  type  II  error,  or  false  positive)  should 
be  at  most  ft.  We  generally  assume  that  both  a  and  ft  are  less  than  1/2.  Figure  2.9  plots  the  probability  of 
accepting  H  as  a  function  of  p,  denoted  Lp,  for  a  hypothetical  acceptance  sampling  test  with  ideal  perfor¬ 
mance  in  the  sense  that  the  probability  of  a  type  I  error  is  exactly  a  and  the  probability  of  a  type  II  error  is 
exactly  ft.  The  parameters  a  and  ft  determine  the  strength  of  an  acceptance  sampling  test. 

The  above  problem  formulation  is  flawed,  however,  as  it  essentially  requires  that  we  can  differentiate 
between  p  =  9  and  p  =  9  —  e  for  arbitrary  e  >  0.  For  p  =  9,  we  require  the  probability  of  accepting  H  to  be 
at  least  l  — a,  but  for  p  only  infinitesimally  smaller  than  9,  the  probability  of  accepting  H  is  required  to  be  at 
most  ft.  For  this  to  work,  we  either  need  to  conduct  exhaustive  sampling,  which  is  impractical  if  the  sample 
population  is  large,  or  we  need  to  have  1  —  a  =  ft,  which  means  that  if  one  error  probability  is  set  low  then 
the  other  is  required  to  be  high.  In  order  to  avoid  exhaustive  sampling  and  obtain  the  desired  control  over 
the  two  error  probabilities,  we  relax  the  hypothesis  testing  problem  by  introducing  two  thresholds  po  and 
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Figure  2.9:  Probability,  Lp,  of  accepting  the  hypothesis 
H  :  p  >  9  as  a  function  of  p  for  a  hypothetical  statistical 
test. 


Figure  2.10:  Probability,  Lp ,  of  accepting  the  hypothesis 
Ho  :  p  >  Po  as  a  function  of  p  for  a  statistical  test  with 
indifference  region. 


pi,  with  po  >  Pi-  Instead  of  testing  H  against  K,  we  choose  to  test  the  hypothesis  Hq  :  p  >  po  against 
the  alternative  hypothesis  H\  :  p  <  p\.  We  require  that  the  probability  of  accepting  H\  when  Ho  holds  is 
at  most  a,  and  the  probability  of  accepting  Ho  when  H\  holds  is  at  most  ft.  Figure  2.10  shows  the  typical 
performance  characteristic  for  a  realistic  acceptance  sampling  test.  If  the  value  of  p  is  between  po  and  p\, 
we  arc  indifferent  with  respect  to  which  hypothesis  is  accepted,  and  both  hypotheses  arc  in  fact  false  in  this 
case.  The  region  (p\ ,  po)  is  referred  to  as  the  indifference  region  and  it  is  shown  as  a  gray  area  in  Figure  2. 10. 

We  will  often  find  it  appropriate  to  define  the  two  thresholds  po  and  p\  in  terms  of  a  single  threshold 
6  and  the  half-width  of  the  indifference  region  <5,  i.e.  po  =  9  +  5  and  p\  =  6  —  5.  Testing  Ho  against  H \ 
can  then  be  interpreted  as  testing  the  hypothesis  H  :  p  >  6  against  the  alternative  hypothesis  K  :  p  <  9, 
as  originally  specified,  where  acceptance  of  Ho  results  in  acceptance  of  H  and  acceptance  of  Hi  results  in 
acceptance  of  K.  The  probability  of  accepting  H  is  therefore  at  least  1  —  a  if  p  >  9  +  5  and  at  most  ft  if 
p  <  9  —  5.  If  \p  —  9\  <  5,  then  the  test  gives  no  bounds  on  the  probability  of  accepting  a  false  hypothesis. 
In  this  case,  however,  we  say  that  p  is  sufficiently  close  to  the  threshold  9  so  that  we  arc  indifferent  with 
respect  to  which  of  the  two  hypotheses,  H  or  K,  is  accepted.  By  narrowing  the  indifference  region,  we  can 
get  arbitrarily  close  to  the  ideal  performance  shown  in  Figure  2.9. 

We  now  turn  to  the  problem  of  finding  a  test  procedure  with  the  desired  characteristics.  A  set  of  n 
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observations  is  referred  to  as  a  sample  from  now  on.  We  first  present  a  test  procedure  that  uses  samples  of 
fixed  size,  and  then  present  a  sequential  test  procedure  where  the  sample  size  required  for  a  test  of  a  given 
strength  is  a  random  variable.  We  will  see  that  the  sequential  test  procedure,  while  giving  no  upper  bound 
on  the  sample  size  for  any  given  run,  typically  requires  far  smaller  samples  on  average  than  a  test  procedure 
using  samples  of  predetermined  size. 

2.2.2  Acceptance  Sampling  with  Fixed-Size  Samples 

A  sample  of  size  n  consists  of  n  observations,  x\.. . . ,  xn,  of  the  Bernoulli  variates  X\, . . . ,  Xn  that  repre¬ 
sent  our  experiment.  To  test  the  hypothesis  Hq  :  p  >  po  against  the  alternative  hypothesis  H\  :  p  <  pi, 
using  a  single  sample  of  size  n,  we  specify  a  constant  c.  If  X^=i  x’  's  greater  than  c,  then  hypothesis  Hq  is 
accepted.  Otherwise,  if  the  given  sum  is  at  most  c,  then  hypothesis  H\  is  accepted.  The  problem  is  now  to 
find  n  and  c  such  that  H  \  is  accepted  with  probability  at  most  a  when  f/(,  holds,  and  Ho  is  accepted  with 
probability  at  most  j3  when  H  \  holds.  The  pair  (n,  c )  represents  an  acceptance  sampling  test  that  uses  a 
single  fixed-size  sample,  and  we  refer  to  this  pair  as  a  single  sampling  plan  (Grubbs  1949;  Duncan  1974). 

Optimal  Single  Sampling  Plans 

The  probability  distribution  of  a  sum  of  n  Bernoulli  variates  with  parameter  p  is  a  binomial  distribution  with 
parameters  n  and  p,  denoted  B(n,p).  The  probability  of  ]T"=|  X,  being  at  most  c  is  therefore  given  by  the 
cumulative  distribution  function  for  B(n,p ): 

(2.3)  F(c-  n,p)  =  Y^  -  PT~l 

i=  0  ' 

Thus,  with  probability  F(c;  n,p )  we  accept  hypothesis  H\  using  a  single  sampling  plan  (n,  c),  and  conse¬ 
quently  hypothesis  Ho  is  accepted  with  probability  1  —  F(c\  n.  p)  by  the  same  sampling  plan. 

If  we  can  find  a  pair  (n,  c)  simultaneously  satisfying  F(c;  n.  p)  <  a  for  all  p  >  po  and  1  —  F(c;n,p)  <  (3 
for  all  p  <  p ] .  then  we  have  a  single  sampling  plan  with  strength  (a,  ff)  for  testing  Hq  against  II \ .  For  fixed 
c  and  n,  F(c:  n,p )  is  a  non-increasing  function  of  p  in  the  interval  [0, 1],  This  means  that  F(c;  n,  po)  <  a 
implies  F(c;  n,p)  <  a  for  all  p  >  po,  and  1  —  F{c\  n,pi)  <  [3  implies  1  —  F(c;  n,p )  <  (3  for  all  p  <  p\. 
Hence,  finding  a  single  sampling  plan  (n,  c)  with  the  prescribed  strength  amounts  to  solving  the  following 
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system  of  non-linear  inequalities  for  the  integer  variables  n  and  c: 

(2.4a)  F(c-,n,p0)<a 

(2.4b)  1  —  F(c;  n,pi)  <  (3 

This  system  of  inequalities  typically  has  an  infinite  number  of  solutions.  We  generally  prefer  sampling 
plans  that  use  small  samples  (i.e.  require  few  observations)  over  those  that  use  large  samples,  so  we  want  to 
minimize  n  subject  to  (2.4a)  and  (2.4b).  The  stated  optimization  problem  does  not  have  a  simple,  closed- 
form  solution,  except  in  a  few  special  cases  discussed  below.  Peach  and  Littauer  (1946)  propose  using  a 
Poisson  approximation  to  find  a  suitable  single  sampling  plan.  Grubbs  (1949)  provide  tables  with  optimal 
sampling  plans  for  a  =  0.05,  (3  =  0.10,  and  n  <  150.  A  graphical  solution  method  is  provided  by  Larson 
(1966,  p.  273).  With  the  widespread  availability  of  fast  digital  computers,  however,  these  solution  methods 
arc  essentially  obsolete. 

Algorithm  2.1  is  a  procedure  for  finding  an  optimal  single  sampling  plan  given  the  parameters  po,  p\, 
a,  and  [3  that  specify  the  hypothesis  testing  problem  and  the  desired  strength  of  the  sampling  plan.  The 
algorithm  uses  binary  search  to  find  a  minimum  sample  size,  n,  under  the  assumption  that  c  does  not  have  to 
be  an  integer.  It  then  searches  linearly  from  the  minimum  to  find  a  valid  single  sampling  plan.  The  inverse 
of  the  function  F(x;n,p )  =  (F([x\;n,p)  +  F(\x];n,p))/ 2,  for  x  G  [0, n\,  is  used  extensively  by  the 
algorithm.  For  fixed  n  and  p,  F{x\  n,p )  is  a  non-decreasing  function  of  x.  Thus,  F(xo',n1po)  <  a  implies 
F(x\  n,po)  <  a  for  all  x  <  xq,  and  1  —  F(xp,  n,pi)  <  f3  implies  1  —  F{x ;  n,p\)  <  (3  for  all  x  >  x\.  As  a 
consequence,  if  x’o  >  x \ ,  then  any  x  in  the  interval  [x\,  xq]  can  be  used  to  simultaneously  satisfy  (2.4a)  and 
(2.4b)  for  the  given  n.  If,  on  the  other  hand,  xq  <  x\,  then  we  need  to  use  a  sample  larger  than  n  in  order  to 
obtain  a  test  with  the  desired  strength. 

Example  2.1.  For  probability  thresholds  p0  =  0.5  and  p\  =  0.3,  and  error  bounds  a  =  0.2  and  (3  =  0.1, 
the  optimal  single  sampling  plan  found  by  Algorithm  2.1  is  (30, 12).  This  means  that  we  need  a  sample  of 
size  30,  and  we  accept  the  hypothesis  p  >  0.5  if  and  only  if  the  sum  of  the  30  observations  exceeds  12. 
Figure  2.10  (p.  18)  plots  the  probability  Lp  =  1  —  F(12;  30.  p)  of  accepting  the  hypothesis  Hq  :  p  >  0.5 
as  a  function  of  p.  We  can  see  that  for  values  of  p  far  away  from  the  indifference  region,  the  probability 
of  accepting  a  false  hypothesis  is  virtually  zero.  Note  also  that  1  —  F(12;  30.  p\ )  ~  0.084  <  f3  and 
F(12;  30,  po)  ~  0.181  <  a,  so  the  actual  strength  of  the  test  is  better  than  (a,  (3). 
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Single-Sampling-Plan  (po,pi,a,  (3) 

H-min  4=  1  >  Umax  —  1 
n  4=  7lmin 

while  7imax  <  0  V  nm ;n  <  nmax  do 
x0  4=  F~1{a-,n,p0) 
xi  4=  -F_1(l  -  (3;n,pi) 
if  xo  >  x\  A  xo  >  0  then 
Umax  4=  n 
else 

H-min  4=  Tl  A  1 
if  «max  <  0  then 
n  -4=  2  •  n 

else 

ft  ^  L(llmin  “I-  ^max  )/2j 
71  4=  7lmax  1 

repeat 

n  4=  n  +  1 

Co  4=  L F~1(a-n,po)\ 

ci  4=  [F_1(l  -P;n,pi)] 

until  co  >  ci 

return  (n,  |_(co  +  ci)/2j) 


Algorithm  2.1:  Procedure  for  finding  an  optimal  single  sampling  plan  using  binary  search.  F  1  (y:  n.  p)  can  be 
computed  by  adding  the  terms  of  (2.3)  until  the  sum  equals  or  exceeds  y. 


Sample  Sizes 


How  large  a  sample  is  required  to  obtain  a  single  sampling  plan  of  strength  (a,  6)  for  testing  Hq  :  p  >  po 
against  H\  :  p  <  p\l  In  general,  we  can  give  only  an  approximate  answer,  but  there  are  two  special  cases 
for  which  n  can  be  expressed  precisely  as  a  formula  of  the  test  parameters. 

First,  consider  the  case  when  p\  =  0  and  p0  <  1.  From  (2.3)  it  follows  that  F(c;  n,  0)  =  1  for  all  choices 
of  n  and  c,  so  (2.4b)  is  trivially  satisfied.  The  reasoning  behind  Algorithm  2. 1  tells  us  that  choosing  c  as  low 
as  possible  makes  it  easier  to  satisfy  (2.4a).  We  therefore  set  c  =  0,  which  gives  us  F(c:  n.  po)  =  (1  —  po)". 
We  can  now  derive  a  lower  bound  for  n  from  (2.4a): 


(2.5) 


(1  —  Po)n  <a^>  n  log(l  -p0)  <  log  a 


n  > 


log  a 


log(l  -  p0) 


The  minimum  sample  size  for  pi  =  0  and  po  <  1  is  thus  n  =  [log  a/  logfl  —  po  )] .  Note  that  n  is 
independent  of  (3,  which  makes  perfect  sense  because  the  given  sampling  plan  will  always  guarantee  a  zero 
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probability  of  accepting  H o  when  II  \  is  true. 

The  second  special  case  is  essentially  a  mirror  image  of  the  first:  p\  >  0  and  po  =  1.  We  can  see  from 
(2.3)  that  F(c;  n,  1)  =  0  so  long  as  c  <  n,  meaning  that  (2.4a)  is  trivial  to  satisfy.  Choosing  c  as  large  as 
possible  makes  it  easier  to  satisfy  (2.4b),  so  we  choose  c  =  n  —  1.  This  gives  us  1  —  F(c\  n,p\)  =  p™  and 
we  can  now  derive  a  lower  bound  for  n  from  (2.4b): 

log  B 

(2.6)  Pi  <  B  n  log  pi  <  log  B  =>■  n  >  ; - 

log  pi 

The  optimal  sample  size  is  therefore  n  =  [log/3/ logpi]  for  p\  >  0  and  po  =  1.  As  in  the  previous 
case,  n  depends  only  on  one  of  the  error  bounds:  the  probability  of  accepting  II \  when  Ho  holds  is  always 
zero.  Table  2.2  summarizes  the  two  cases  when  we  can  express  n  exactly,  and  also  shows  the  optimal  single 
sampling  plan  for  the  degenerate  case  when  the  indifference  region  is  (0, 1). 

Example  2.2  (“five  nines”)-  Imagine  that  we  are  testing  a  critical  system,  and  we  want  to  be  almost  certain 
that  the  system  almost  never  fails.  Letpo  =  1,  pi  =  1  —  10~5  =  0.99999  and  /3  =  10~10.  Table  2.2  gives 
us  the  single  sampling  plan  (2302574,  2302573)  for  the  specified  parameters  of  the  test.  This  implies  that  to 
guarantee  a  probability  of  at  most  10  "  10  of  accepting  the  system  as  functional  when  its  failure  probability 
is  at  least  10-5,  we  should  make  over  two  million  observations  and  accept  the  system  only  if  we  observe  no 
failures. 

We  can  derive  an  approximation  formula  for  n  when  p\  >  0  and  po  <1.  A  binomial  distribution 
B(n,p)  has  mean  np  and  variance  np{  1  —  p).  Let  Y  =  QT/Wj  A*  —  np)  / ^np(  1  —  p),  where  each  Xt  is  a 
Bernoulli  variate  with  parameter  p  as  before.  Then  Y  is  approximately  normal  with  mean  0  and  variance  1 
for  large  n,  as  first  shown  by  De  Moivre  (1738). 2  In  other  words,  Pr[y  <  ~  <Hx'),  with  4>(.x)  being  the 

2Pearson  (1924)  aids  the  modem  statistician  in  understanding  the  contribution  of  De  Moivre. 
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standard  normal  cumulative  distribution  function  given  by  (2.1).  We  accept  hypothesis  Hi  if  ]y/'= i  x%  <  A 
for  some  constant  c,  so  the  probability  of  accepting  Hi  is  approximately  3>((c  —  np)/y/np(l  —  p)).  The 
optimal  single  sampling  plan  should  accept  Hi  with  probability  a  if  p  =  p0  and  probability  1  —  (3  if  p  =  pi. 
Using  the  inverse  of  <f>(x)  and  the  fact  that  =  1  —  <f>(— x),  we  can  express  these  constraints  as  follows: 


(2.7a) 

(2.7b) 


-  —  x  ) 

V«Po(l  -p o) 

;-nvv 

\/  npi{l  —  pi) 


By  adding  (2.7a)  and  (2.7b),  we  can  derive  an  approximation  formula  for  n: 

(c  -  7ip0)  -  (c  -  npi )  =  T*_1(a)A/np0(  1  -  Po)  +  ^l{ft)\/npi(l  -  pi) 


(2.8)  Vn{Pi  ~  Po)  =  $  1(a)y/po(l-po)  +  $  l(ft) y/M1  -  Pi) 

=  ($~1(a)VPo(l-Po)  +  <S,~1{f3)VPi(.l-Pi))2 

{Po-Pi)2 

Thus,  the  sample  size  for  a  single  sampling  plan  is  approximately  inversely  proportional  to  the  squared 
width  of  the  indifference  region.  The  presence  of  the  factors  \/ Pi  ( 1  —  pL )  in  the  numerator  indicates  that 
the  sample  size  also  depends  on  the  placement  of  the  indifference  region.  For  a  fixed  width,  the  sample  size 
is  largest  if  the  indifference  region  is  centered  around  p  =  1/2,  and  it  decreases  if  the  indifference  region  is 
shifted  towards  p  =  0  or  p  =  1. 

To  get  an  idea  of  how  the  sample  size  depends  on  a  and  6,  we  can  use  the  following  approximation  for¬ 
mula  for  the  inverse  normal  cumulative  distribution  function  with  rj  =  \J  —  log  a 2  (Hastings  1955,  p.  191): 

(2.9)  <F-V) «  i-V)  =  -n+  .  +  2  ,  |<b-V)  -  $-V) I  <  3  •  10-3 

1  +  hp  +  b-2pz  1  1 


a0  =  2.30753  b\  =  0.99229 
ai  =  0.27061  b2  =  0.04481 


This  means  that  n  is  roughly  proportional  to  the  logarithm  of  a  and  (3 .  Consequently,  decreasing  a  or  ft 
tends  to  be  less  costly  than  narrowing  the  indifference  region. 

Example  2.3.  For  probability  thresholds  p0  =  0.505  and  p\  =  0.495,  and  error  bounds  a  =  ft  =  10~2, 
the  approximation  formulae  (2.8)  and  (2.9)  give  us  n  ~  54174.  The  true  value  for  n,  computed  by  Algo¬ 
rithm  2.1,  is  54117.  If  we  keep  the  same  error  bounds,  but  shift  the  indifference  region  by  setting  po  =  0.905 
and  pi  =  0.895,  we  get  19490  as  the  approximate  sample  size  and  19481  as  the  exact. 
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Simple-Sequential-Test (p0, pi,  a,  /3) 

(n,c)  4=  Single- Sampling-Plan (p0, Pi, a,/?) 
m  -4=  0,  dm  -4=  0 

while  dm  <  c  A  dm  +  n  —  m  >  c  do 
m  4=  m  +  1 
dm  4=  dm—  i  ~T 

if  dm  >  c  then 
return  // o 
else 

return  H  \ 


Algorithm  2.2:  Sequential  acceptance  sampling  procedure  based  on  a  single  sampling  plan. 


2.2.3  Sequential  Acceptance  Sampling 

The  sample  size  for  a  single  sampling  plan  is  fixed  and  therefore  independent  of  the  actual  observations 
made.  It  is  often  possible,  however,  to  reduce  the  expected  number  of  observations  required  to  achieve  a 
desired  test  strength  by  taking  the  observations  into  account  as  they  are  made. 


Sequential  Modification  of  Single  Sampling  Plan 

If  we  use  a  single  sampling  plan  (n,  c)  and  the  sum  of  the  first  m  observations  (m  <  n)  is  already  greater 
than  c,  then  we  can  accept  Hq  without  making  further  observations.  Conversely,  if  the  sum  of  the  first  m 
observations  is  dm,  and  dm+n—m  <  c  so  that  regardless  of  the  outcome  of  the  remaining  n—m  observations 
we  already  know  that  the  sum  of  n  observations  will  not  exceed  c,  then  we  can  safely  accept  Hi  after  making 
only  m  observations.  The  modified  test  procedure,  summarized  in  Algorithm  2.2,  is  a  simple  example  of  a 
sequential  sampling  plan:  after  each  observation,  we  decide  whether  sufficient  information  is  available  to 
accept  either  of  the  two  hypotheses  or  additional  observations  arc  required. 


The  Sequential  Probability  Ratio  Test 

The  idea  of  reducing  the  expected  sample  size  by  taking  observations  into  account  as  they  arc  made  was 
first  explored  by  Dodge  and  Romig  (1929),  who  constructed  double  sampling  plans  where  a  second  sample 
is  drawn  only  if  the  observations  constituting  the  first  sample  do  not  give  sufficient  support  for  accepting 
a  hypothesis.  A  general  theory  of  sequential  hypothesis  testing  was  later  developed  in  a  seminal  paper  by 
Wald  (1945),  where  the  sequential  probability  ratio  test  is  defined.  This  test  is  provably  optimal  in  the  sense 
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that  it  minimizes  the  expected  sample  size  if  p  =  po  or  p  =  p\  (Wald  and  Wolfowitz  1948),  and  the  expected 
savings  in  the  number  of  required  observations  compared  to  a  single  sampling  plan  is  often  substantial  even 
if  we  use  the  sequential  modification  of  the  latter. 

The  sequential  probability  ratio  test  is  carried  out  as  follows.  At  the  mth  stage  of  the  test,  i.e.  after 
making  m  observations  x\, ... ,  xm,  we  calculate  the  quantity 

(2  1Q)  Pirn  =  -TT  Pr[Xj  =  Xj\p  =  Pi]  =  pfm  (1  ~  Pl)m~d^ 

POm  “  ?AXi  =  xi\P  =  Po]  Pom  {1  ~  Po)m~dm 

where  dm  =  Yl'-Li  xi-  The  quantity  pjrn  is  simply  the  probability  of  the  observation  sequence  x\, . . . ,  xm, 
given  that  Pr  [Xi  =  1]  =  pr  This  makes  the  computed  quantity  a  ratio  of  two  probabilities,  hence  the  phrase 
probability  ratio  in  the  name  of  the  test.  Hypothesis  H0  is  accepted  if 


(2.11) 

and  hypothesis  H\  is  accepted  if 


Plm 

POm. 


<  B 


(2.12) 


>  A 


POm 


Otherwise,  additional  observations  arc  made  until  either  (2.11)  or  (2.12)  is  satisfied.  A  and  B,  with  A  >  B, 
arc  chosen  so  that  the  probability  is  at  most  a  of  accepting  H  \  when  H0  holds,  and  at  most  (3  of  accepting 
H0  when  H\  holds. 

Finding  A  and  B  that  gives  strength  (a,  (3)  is  non-trivial.  In  practice  we  choose  A  =  (1  —  (3) /a  and 
B  =  (3/(1  —  a ),  which  results  in  a  test  that  very  closely  matches  the  prescribed  strength.  Let  the  actual 
strength  of  this  test  be  (a',  (3').  Wald  (1945,  p.  131)  shows  that  the  following  inequalities  hold: 

a 


(2.13) 

(2.14) 


a'  < 


0  < 


l-(3 
1  —  a 


This  means  that  if  a  and  (3  arc  small,  which  typically  is  the  case  in  practical  applications,  then  a'  and  (3'  can 
only  narrowly  exceed  the  target  values.  Wald  (1945,  p.  132)  also  shows  that  a'  +  (31  <  a  +  (3,  so  at  least  one 
of  the  inequalities  a'  <  a  and  (3'  <  (3  must  hold,  and  in  practice  we  often  find  that  both  inequalities  hold. 


Example  2.4.  Let  po  =  0.5,  p\  =  0.3,  a  =  0.2  and  (3  =  0.1  as  in  Example  2.1.  If  we  use  A  =  (1  —  (3) /a 
and  B  =  (3/(1  —  a),  then  we  are  guaranteed  that  a'  <  0.2/0. 9  «  0.222  and  (3 '  <  0. 1/0.8  =  0.125  by  the 
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inequalities  (2.13)  and  (2.14).  Through  computer  simulation  we  obtain  the  estimates  a'  ~  0.175  <  a  and 
/?'  ~  0.082  <  (3,  so  the  strength  of  the  test  is  in  reality  better  than  (a,  j3). 

If  po  =  1  or  pi  =  0,  then  the  sequential  probability  ratio  test  is  equivalent  to  the  test  procedure  encoded 
by  Algorithm  2.2,  provided  that  we  choose  A  =  oT1  and  B  =  (3.  For  po  =  1  and  xt  =  1  for  all  i  up  to  and 
including  m,  the  probability  ratio  (2.10)  equals  p)1.  We  therefore  accept  // o  if  p)1  <  (3,  which  is  identical 
to  the  condition  in  (2.6).  If,  on  the  other  hand,  we  observe  a  single  zero  before  condition  (2.1 1)  is  satisfied, 
the  probability  ratio  becomes  oo  and  we  immediately  accept  Hi,  corresponding  to  choosing  c  =  n  —  1  for 
a  single  sampling  plan.  For  pi  =  0,  the  probability  ratio  equals  (1  —  po)~m  if  the  first  m  observations  are 
zeros.  We  accept  Hi  if  (1  —  po)~m  >  a-1,  which  is  equivalent  to  the  condition  in  (2.5).  In  this  case,  we 
accept  Ho  if  we  observe  a  single  one  before  condition  (2.12)  is  satisfied,  corresponding  to  c  =  0  for  a  single 
sampling  plan.  Anderson  and  Friedman  (1960)  call  sampling  plans  of  this  kind  curtailed  single  sampling 
plans  and  they  prove  that  such  plans  arc  strongly  optimal.  This  means  that  any  other  sampling  plan  with 
at  least  the  same  strength  always  requires  at  least  as  many  observations  for  all  values  of  p.  In  general,  as 
mentioned  above,  the  sequential  probability  ratio  test  only  guarantees  expected  optimality  for  p  e  {po,pi}. 

When  implementing  the  sequential  probability  ratio  test,  it  is  typically  computationally  more  practical 
to  work  with  the  logarithm  of  p\m/Pom-  At  stage  m,  we  therefore  compute 

f  i  Plm  j  l  Pi  |  (  j  \  l  1  —  Pi 
fm.  =  log -  =  dm  log - h  (in-  dm)  log  - -  . 

POm  P0  1  —  P0 

We  accept  Ho  if  f  m  <  log  j@—,  accept  H\  if  fm  >  log  and  make  at  least  one  more  observation 
otherwise.  Pseudocode  for  the  sequential  probability  ratio  test  is  given  as  Algorithm  2.3. 

Geometric  Interpretation  of  Sequential  Tests 

To  gain  a  better  understanding  of  how  sequential  tests  work,  it  is  intuitively  appealing  to  give  a  geometric 
interpretation  of  such  tests.  At  stage  m  of  a  sequential  test,  we  summarize  the  m  observations  made  so  far 
with  the  statistic  dm.  The  pair  (m,  dm)  can  be  considered  as  the  current  state  of  the  test,  where  rn  and  drn 
arc  non-negative  integers  with  dm  <  m.  The  two-dimensional  space  S  =  { (m,  d,n)  £  Z*  x  Z*  |  dm  <  rn  } 
constitutes  the  possible  states  of  a  sequential  test.  Any  given  sequential  test  procedure  subdivides  the  space 
S  into  three  mutually  exclusive  regions  Rq,  Ri,  and  Rc  (“continue”).  The  test  is  terminated  the  first  time 
the  state  of  the  test  enters  either  Rq  or  R.\ .  At  the  entrance  of  the  subregion  R, ,  hypothesis  H%  is  accepted. 
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SPRT(p0,Pi,  a,  P) 

if  po  =  1  V  pi  =  0  then 

return  Simple-Sequential-Test (po,p\, 

a,  (3) 

else 

tti  0,  fm  <^=  0 

while  log  <  fm  <  log  do 

m  <=  m  +  1 

fm  <=  fm- 1  +  Xm  log  +  (1  -  Xm)  log 

l-pi 

l-po 

if  fm  <  log  then 

return  Hq 

else 

return  H\ 

Algorithm  2.3:  Procedure  implementing  the  sequential  probability  ratio  test. 


The  subregion  Rc  represents  states  where  additional  observations  are  required.  This  region  always  contains 
the  point  (0,  0),  meaning  that  a  sequential  test  starts  in  this  region. 

For  a  sequential  test  derived  from  a  single  sampling  plan  (n,  c },  we  never  make  more  than  n  observations, 
so  the  state  space  of  such  a  test  is  S'  =  { (m,  dm)  G  S  \  m  <  nj.  We  accept  Hq  if  dm  >  c.  Thus, 
we  set  Rq  =  { (m,  drn)  G  S'  \  dm  >  c}.  For  the  same  test,  we  accept  H  \  if  drn  <  rri  +  c  —  n,  so 
R\  =  {( m,dm )  G  S'  |  dm  <  m  +  c  —  n}.  Figure  2.11  displays  the  regions  graphically  for  po  =  0.5, 
pi  =  0.3,  a  =  0.2,  and  (3  =  0.1  (i.e.  n  =  30  and  c  =  12  as  stated  in  Example  2.1).  The  shaded  regions 
represent  unreachable  states  ( drn  >  rn  and  rn  >  n).  The  line  dm  =  c  that  defines  the  boundary  between 
Rc  and  Ro  is  called  the  acceptance  line,  while  the  line  dm  =  rn  +  c  —  n  defining  the  boundary  between  Rc 
and  Ri  is  called  the  rejection  line.  The  test  can  be  carried  out  graphically  by  plotting  a  curve  representing 
the  outcome  of  the  observations.  The  solid  curve  in  Figure  2.11  represents  the  observations  x,  =  1  for 
i  G  {1, 3, 4,  6,  7, 8}  and  xr  =  0  for  i  G  {2,  5}.  Hypothesis  Hq  is  accepted  the  moment  this  curve  intersects 
the  acceptance  line,  and  H  \  is  accepted  (Hq  is  rejected)  the  moment  the  curve  intersects  the  rejection  line. 

In  contrast,  the  sequential  probability  ratio  test  terminates  if  fm  <  log  (accept  Hq)  or  fm  >  log 
(accept  Hi).  We  can  write  these  termination  criteria  as  drn  >  Hq  +  ms  and  drn  <  hi  +  ms  respectively, 
where  Hq,  hi,  and  s  arc  given  by  the  following  expressions: 


(2.15) 
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0  5  10  15  20  25  30  m  0  5  10  15  20  25  30  m 


Figure  2.11:  Graphical  representation  of  a  sequential  Figure  2.12:  Graphical  representation  of  the  sequential 
single  sampling  plan  for  po  =  0.5,  pi  =  0.3,  a  =  0.2,  probability  ratio  test  for  po  =  0.5,  p\  =  0.3,  a  =  0.2, 
and  / 3  =  0.1  (n  =  30  and  c  =  12).  and  j3  =  0.1. 

We  can  therefore  define  the  acceptance  region  Rq  =  { (m,  dm)  G  S  \  dm  >  ho  +  ms  }  for  the  sequential 
probability  ratio  test.  The  line  dm  =  hp+ms  is  the  acceptance  line  for  the  test.  Similarly,  R\  =  { (m,  drn }  G 
S  |  dm  <  hi  +  ms}  making  drn  =  hi  +  ms  the  rejection  line  for  the  test. 

Figure  2.12  shows  a  graphical  representation  of  the  sequential  probability  ratio  test  for  the  same  param¬ 
eters  that  were  used  in  Figure  2.11.  The  solid  curve  represents  the  same  observation  sequence  as  was  plotted 
in  Figure  2.11.  Note  that  the  curve  intersects  the  acceptance  line  with  the  eighth  observation,  so  we  accept 
the  hypothesis  Hq  :  p  >  0.5  at  this  point  if  we  use  the  sequential  probability  ratio  test.  The  same  observa¬ 
tion  sequence  does  not  result  in  acceptance  in  Figure  2.11,  which  indicates  that  we  can  reduce  the  expected 
number  of  observations  by  using  the  sequential  probability  ratio  test.  The  acceptance  and  rejection  lines  arc 
parallel  with  common  slope  s.  Consequently,  the  region  Rc  is  unbounded  and  there  is  no  upper  bound  on 
the  number  of  observations  that  the  test  will  require  before  terminating.  However,  the  probability  is  equal 
to  one  that  the  sequential  probability  ratio  test  will  eventually  terminate  (Wald  1945,  p.  128),  although  the 
sample  size  may  vary  greatly. 

Expected  Sample  Sizes 

The  sample  size  for  a  sequential  acceptance  sampling  test  is  a  random  variable,  meaning  that  the  required 
number  of  observations  can  vary  from  one  use  of  such  a  test  to  another.  Furthermore,  the  expected  sample 
size  typically  depends  on  the  unknown  parameter  p,  so  we  cannot  report  a  single  value  as  was  the  case  for 
acceptance  sampling  with  fixed-size  samples.  The  expected  sample  size  varies  with  the  distance  of  p  from 
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the  indifference  region  (p\  ,po).  It  tends  to  be  largest  when  p  is  close  to  the  center  of  the  indifference  region, 
and  decreases  the  further  away  p  is  from  the  indifference  region. 

First,  consider  the  sequential  variation  of  a  single  sampling  plan  (n,  c).  The  test  terminates  at  stage  m 
if  dm  >  c  (accept  Hq)  or  dm  <  m  +  c  —  n  (accept  H\).  The  probability  of  the  test  terminating  at  stage  rn 
by  accepting  Ho  is  equal  to  the  probability  of  observing  exactly  c  ones  in  the  first  m  —  1  observations  and 
then  an  additional  one.  This  probability  can  be  expressed  as  p  ■  f(c;  m  —  l,p),  where  p  is  the  probability 
of  observing  a  one  and  /(c;n,p)  is  the  probability  density  function  for  Il(ri.  p).  Note  that  we  could  not 
have  accepted  H \  prior  to  stage  m  under  these  conditions,  because  we  accept  II  \  only  if  the  remaining 
observations  cannot  lead  to  acceptance  of  Hq.  The  test  terminates  at  stage  m  by  accepting  H  \  if  we  observe 
exactly  m  +  c  —  n  ones  in  the  first  m  —  1  observations  followed  by  a  zero,  which  occurs  with  probability 
(1  —  p)  f  (m  +  c  —  n;m  —  I .  p).  The  expected  sample  size  Ep  as  a  function  of  p  can  therefore  be  expressed 
as  follows: 

n  n 

(2.16)  Ep  =  ^2  m  ■  p  ■  f(c;m  —  l,p)  +  m  ■  (1  —  p)  ■  f(m  +  c  —  n;  m  —  l,p) 

m=c+ 1  m=n—c 

Naturally,  Ep  can  never  exceed  n,  is  exactly  n  —  c  W'  p  =  0,  and  is  exactly  c  +  1  if  p  =  1. 

The  expected  sample  size  for  the  sequential  probability  ratio  test  is  harder  to  determine.  Wald  (1945, 
p.  164)  provides 

Lp  log  T— - F  (1  -  Lp)  log  - — - 

(2.17)  Ep  = - - — ^ - - - 9L- 

i  Fi  ,  n  m  1  ~Pi 
P  log - h  (1  -  p)  log  - - 

Po  1  -  Po 

as  a  good  approximation  of  Ep  when  pi  is  not  far  from  po,  which  is  typically  the  case  in  practice.  The 
quantity  Lp  is  the  probability  of  accepting  Hq  when  Pr[2Q  =  1]  =  p.  Wald  provides  an  approximation 
formula  for  Lp  as  well,  but  the  formula  is  not  suited  for  computing  an  approximation  of  Lp  for  an  arbitrary 
p.  Approximating  Ep  for  an  arbitrary  p  is  therefore  non-trivial,  but  we  can  provide  explicit  formulae  for  a 
few  cases  of  special  interest,  as  shown  in  Table  2. 3. 3  The  expected  sample  size  increases  from  0  to  p\  and 
decreases  from  po  to  1.  In  the  indifference  region  (p  \ ,  po),  the  sample  size  increases  from  p\  to  some  point 
p'  and  decreases  from  p'  to  po-  The  point  p'  is  generally  equal  to  s  or  at  least  very  near  s,  where  s  is  the 
common  slope  given  in  (2.15)  of  the  acceptance  and  rejection  lines  (Wald  1947,  p.  101). 

3The  approximation  formulae  for  p  =  0  and  p  =  1  differ  from  those  derived  by  Wald  (1947,  pp.  99-100).  This  is  because  we 
assume  po  >  pi,  while  Wald  assumes  the  opposite. 
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Table  2.3:  Approximate  expected  sample  size  for  the  sequential  probability  ratio  test. 
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Figure  2.13:  Expected  sample  size  for  a  sequential  single  sampling  plan  (dashed  curve)  and  the  sequential  probability 
ratio  test  (solid  curve)  with  po  =  0.5,  pi  =  0.3,  a  =  0.2,  and  (3  =  0.1.  The  error  bars  extend  a  standard  deviation 
in  each  direction  from  the  curves.  The  crosses  mark  the  approximate  expected  sample  size  for  the  cases  listed  in 
Table  2.3.  The  indifference  region  is  fairly  wide  in  this  case,  resulting  in  a  relatively  large  approximation  error.  For  a 
narrower  indifference  region,  the  approximation  error  is  generally  much  less  noticeable. 


Figure  2.13  plots  the  expected  sample  size  as  a  function  of  the  true  probability  p  for  the  sequential  single 
sampling  plan  and  the  sequential  probability  ratio  test  with  po  =  0.5,  p\  =  0.3,  a  =  0.2,  and  (3  =  0.1.  The 
curve  for  the  former  was  computed  using  (2. 16),  while  the  curve  for  the  latter  was  generated  using  computer 
simulation.  We  see  that  the  sequential  probability  ratio  test  has  a  lower  average  than  the  sequential  test 
derived  from  a  single  sampling  plan,  but  that  the  variance  is  much  larger  when  p  is  in,  or  close  to,  the 
indifference  region.  As  we  will  see  next,  however,  the  sequential  probability  ratio  test  does  not  always  have 
a  lower  expected  sample  size  than  a  sequential  single  sampling  plan  with  the  same  strength. 


Optimality  of  Sequential  Tests 

For  the  particular  choice  of  parameters  that  was  used  to  produce  Figure  2. 13,  the  sequential  probability  ratio 
test  has  a  lower  expected  sample  size  than  an  optimal  single  sampling  plan  for  all  values  of  p.  In  general, 
however,  this  is  not  guaranteed  to  be  the  case.  While  the  sequential  probability  ratio  test  minimizes  the 
expected  sample  size  at  po  and  p\  simultaneously,  there  may  very  well  exist  alternative  tests  that  achieve  a 
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lower  expected  sample  size  for  other  values  of  p,  in  particular  for  p  £  (pi,Po). 

Example  2.5.  For  po  =  0.5,  p\  =  0.3,  and  a  =  (3  =  10~4,  the  optimal  single  sampling  plan  requires 
exactly  326  observations.  In  contrast,  the  expected  sample  size  for  the  sequential  probability  ratio  test  is  510 
at  p  =  s,  which  is  a  56  percent  increase  in  the  expected  sample  size  compared  to  a  single  sampling  plan. 

It  is  easy  to  see  that  the  expected  sample  size  at  p  =  s  for  the  sequential  probability  ratio  test  can  be 
larger  than  the  fixed  sample  size  of  a  single  sampling  plan  if  a  and  (3  arc  sufficiently  small.  Consider  the 
case  when  at  =  f3.  From  the  approximation  formula  for  p  =  s  in  Table  2.3,  it  follows  that  the  numerator 
of  Es  is  equal  to  (log (of  1  —  l))2,  which  means  that  Es  is  approximately  proportional  to  the  square  of 
log  cr.  From  (2.8)  and  (2.9),  on  the  other  hand,  it  follows  that  the  sample  size  for  a  single  sampling  plan  is 
approximately  proportional  to  log  a.  As  a  approaches  zero,  (logo)2  grows  faster  than  logo,  which  helps 
explain  the  fact  that  Es  can  be  larger  for  the  sequential  probability  ratio  test  than  for  a  single  sampling  plan. 

Kiefer  and  Weiss  (1957)  suggest  minimizing  the  expected  sample  size  at  a  third  point  p2,  instead  of  at 
po  and  pi,  by  using  a  generalized  sequential  probability  ratio  test.4  If  po  is  chosen  with  care,  the  resulting 
test  minimizes  the  maximum  expected  sample  size.  Weiss  (1962)  derives  such  a  test  for  the  symmetric  case 
with  pq  =  i  +  5  and  p\  =  ^  —  8,  while  Freeman  and  Weiss  (1964)  consider  approximate  solutions  for  the 
general  case.  The  test  is  designed  to  minimize 

bo  Pr[F7i  accepted|p  =  po\  +  b±  Pr[770  accepted|p  =  p\\  +  b‘2Ep2  , 

where  bo,  b\,  and  62  arc  user-specified  positive  constants  such  that  l>o  +  Ip  +  b-2  =  1.  For  some  choice  of 
these  constants,  the  resulting  test  has  strength  (a,  fi),  although  the  exact  relationship  is  unknown  (Freeman 
and  Weiss  1964,  p.  69).  While  this  surely  is  an  interesting  alternative  problem  formulation,  we  will  not 
explore  it  further  in  this  thesis  because  it  represents  a  departure  from  the  model  where  the  user  specifies 
the  desired  strength  of  the  test.  Schwarz  (1962)  and  Lai  (1988)  consider  yet  another  problem  formulation 
where  the  objective  is  to  minimize  the  expected  cost  subject  to  a  cost  c  per  observation  and  a  unit  cost  for 
accepting  a  false  hypothesis.  We  refer  the  interested  reader  to  Lai  (2001)  for  a  more  detailed  account  of  the 
developments  in  the  field  of  sequential  hypothesis  testing  since  the  ground-breaking  work  of  Wald. 

4The  condition  for  making  an  additional  observation  at  stage  m  when  using  a  generalized  sequential  probability  ratio  test  is 
B  m  ''C  Plm  /po  m  <  Am  (Weiss  1953).  The  test  is  a  regular  sequential  probability  ratio  test  if  Am  =  A  and  Bm  =  B  for  all  m. 
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2.3  Stochastic  Discrete  Event  Systems 

This  section  formally  defines  the  class  of  systems  for  which  we  develop  verification  and  planning  algorithms 
in  later  chapters.  We  rely  heavily  on  the  notion  of  a  stochastic  process,  which  is  any  process  that  evolves 
over  time,  and  whose  evolution  we  can  follow  and  predict  in  terms  of  probability  (Doob  1942,  1953).  At 
any  point  in  time,  a  stochastic  process  is  said  to  occupy  some  state.  If  we  attempt  to  observe  the  state  of  a 
stochastic  process  at  a  specific  time,  the  outcome  of  such  an  observation  is  governed  by  some  probability 
law.  Mathematically,  we  define  a  stochastic  process  as  a  family  of  random  variables. 

Definition  2.1  (Stochastic  Process).  Let  S  and  T  be  two  sets.  A  stochastic  process  is  a  family  of  random 
variables  X  =  {Xf  \  t  e  T},  with  each  random  variable  Xt  having  range  S. 

The  index  set  T  in  Definition  2.1  represents  time  and  is  typically  the  set  of  non-negative  integers,  Z*, 
for  discrete-time  stochastic  processes  and  the  set  of  non-negative  real  numbers,  [0,  oo),  for  continuous-time 
stochastic  processes.  We  will  generally  assume  that  T  is  such  that  if  t  G  T  and  t'  G  T  for  t'  >  t,  then 
t'  —  t  €  T.  The  set  S  represents  the  states  that  the  stochastic  process  can  occupy,  and  this  can  be  an  infinite, 
or  even  uncountable,  set. 

The  definition  of  a  stochastic  process  as  a  family  of  random  variables  is  quite  general  and  includes  sys¬ 
tems  with  both  continuous  and  discrete  dynamics.  We  will  focus  our  attention  on  a  limited,  but  important, 
class  of  stochastic  processes:  stochastic  discrete  event  systems.  This  class  includes  any  stochastic  process 
that  can  be  thought  of  as  occupying  a  single  state  for  a  duration  of  time  before  an  event  causes  an  instanta¬ 
neous  state  transition  to  occur.  The  canonical  example  of  such  a  process  is  a  queuing  system,  with  the  state 
being  the  number  of  items  currently  in  the  queue.  Thus,  the  state  space  S  is  {0, 1, ... ,  n}  if  the  queue  has 
finite  capacity  n  and  Z*  if  it  has  infinite  capacity.  The  state  changes  at  the  occurrence  of  an  event  repre¬ 
senting  the  arrival  or  departure  of  an  item.  We  call  this  a  discrete  event  system  because  the  state  change  is 
discrete  rather  than  continuous  and  is  caused  by  the  triggering  of  an  event. 

2.3.1  Trajectories 

A  random  variable  Xt  G  X  represents  the  chance  experiment  of  observing  the  stochastic  process  X  at  time 
t.  If  we  record  our  observations  at  consecutive  time  points  for  all  t  G  T,  then  we  have  a  trajectory,  or  sample 
path,  for  X.  Our  work  in  probabilistic  model  checking  is  centered  around  the  verification  of  temporal  logic 
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Figure  2.14:  A  trajectory  for  a  simple  queuing  system  with  arrival  events  occurring  at  t\,  C  and  / ;>  and  a  departure 
event  occurring  at  l  \.  The  state  of  the  system  represents  the  number  of  items  in  the  queue. 


formulae  over  trajectories  for  stochastic  discrete  event  systems.  The  terminology  and  notation  introduced 
here  is  used  extensively  in  later  chapters. 

Definition  2.2  (Trajectory).  A  trajectory  for  a  stochastic  process  X  is  any  set  of  observations  {xt  G  S  \  t  G 
T}  of  the  random  variables  Xt  G  X. 


The  trajectory  of  a  stochastic  discrete  event  system  is  piecewise  constant  and  can  therefore  be  repre¬ 
sented  as  a  sequence  a  =  {(sq,  fo)j  («i,  t\), . . .},  with  Si  G  S  and  ti  G  T  \  {0}.  Zero  is  excluded  to  ensure 
that  only  a  single  state  can  be  occupied  at  any  point  in  time.  Figure  2.14  plots  paid  of  a  trajectory  for  a 
simple  queuing  system.  Let 


i.e.  Ti  is  the  time  at  which  state  s,  is  entered  and  tt  is  the  duration  of  time  for  which  the  process  remains  in 
Si  before  an  event  triggers  a  transition  to  state  Si+  \ .  A  trajectory  o  is  then  a  set  of  observations  of  X  with 
xt  =  Si  for  Th  <  t  <  T,  +  ti .  According  to  this  definition,  trajectories  of  stochastic  discrete  event  systems 
arc  right-continuous.  A  finite  trajectory  is  a  sequence  o  =  {(.sq.  to), . . . ,  (sn,  oo)}  where  sn  is  an  absorbing 
state,  meaning  that  no  events  can  occur  in  sn  and  that  xt  =  sn  for  all  t  >  Tn. 

An  infinite  trajectory  is  convergent  if  <  oo.  In  this  case,  xt  is  not  well-defined  for  all  t  G  T.  For  a 
trajectory  to  be  convergent,  however,  an  infinite  sequence  of  events  must  occur  in  a  finite  amount  of  time, 
which  is  unrealistic  for  any  physical  system.  Hoel  et  al.  (1972)  use  the  term  explosive  to  describe  processes 
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for  which  such  sequences  can  occur  with  non-zero  probability.  It  is  common  to  assume  time  divergence  for 
infinite  trajectories  of  real-time  systems  (cf.  Alur  and  Dill  1994),  i.e.  that  the  systems  arc  non-explosive,  and 
most  finite-state  systems  satisfy  this  property  by  default. 

2.3.2  Measurable  Stochastic  Discrete  Event  Systems 

Of  utmost  importance  to  probabilistic  model  checking  is  the  definition  of  a  probability  measure  over  sets  of 
trajectories  for  a  system.  The  set  of  trajectories  must  be  measurable.  Formally,  a  measurable  space  is  a  set 
fl  with  a  cr-algebra  To  of  subsets  of  Q  (Halmos  1950).  A  probability  space  is  a  measurable  space  (0.  To) 
and  a  probability  measure  p.  When  we  say  that  a  set  Q  must  be  measurable,  we  really  mean  that  there  must 
be  a  cr-algebra  for  the  set.  The  elements  of  this  cr-algebra  arc  the  measurable  subsets  of  11. 

For  stochastic  discrete  event  systems,  the  elements  of  the  cr-algebra  arc  sets  of  trajectories  with  common 
prefix.  A  prefix  of  a  trajectory  a  =  {(so,to),  (s  t,  ft), . . .}  is  a  sequence  cr<T  =  {(sq,  t'0), . . . ,  (s'k,  t'k)},  with 
s'  =  Si  for  all  i  <  k,  Yli=o^i  =  r>  ^  =  h  f°r  all  *  <  k,  and  t'k  <  tk-  Let  Path(cr<T )  denote  the  set  of 
trajectories  with  common  prefix  cr<T .  This  set  must  be  measurable,  and  we  assume  that  a  probability  measure 
p  over  the  set  of  trajectories  with  common  prefix  exists.  For  our  work  on  probabilistic  model  checking,  we 
assume  only  that  we  can  generate  sample  trajectory  prefixes  distributed  according  to  p. 

A  probability  measure  p  over  sets  of  trajectories  with  common  prefix  can  be  defined  for  virtually  all 
systems  of  practical  interest,  although  the  precise  definition  thereof  will  of  course  depend  on  the  specific 
probability  structure  of  the  stochastic  discrete  event  system  being  studied.  In  general,  a  stochastic  discrete 
event  system  is  measurable  if  the  sets  S  and  T  arc  measurable.  We  can  show  this  by  defining  a  cr-algebra 
over  the  set  of  trajectories  with  common  prefix  cr<T  =  {(so,  to), ,  ( Sk ,  t /,.)},  denoted  Path{a<T),  as  fol¬ 
lows.  Let  Ts  be  a  cr-algebra  over  the  state  space  S,  and  let  J~t  be  a  cr-algebra  over  the  index  set  T  of 
the  stochastic  process.  Such  cr-algebras  exist  if  S  and  T  arc  measurable  sets,  which  by  assumption  they 
arc.  Then  C(a<T,  Ik,  Sk+i,  ■■  Sn),  with  ,3’,  e  and  h  G  Tr,  denotes  the  set  of  trajectories 

a  =  {(sq,  t'0),  {s\  ,t[), . . .}  such  that  s[  =  st  for  i  <  k,  s'  G  Si  for  k  <  i  <  n,t\  =  U  for  i  <  k,  t'k  >  tk, 
and  t\  G  U  for  k  <  i  <  n.  In  other  words,  C(cr<T ,  Ik,  Sk+i,  ■  ■  ■ ,  In-i,Sn)  is  a  subset  of  Path(a<T).  The 
sets  C(cr<T,  Ik,  Sk+ 1,  •  •  • ,  In- 1)  Sn)  arc  the  elements  of  a  cr-algebra  over  the  set  Path(a<T)  with  set  opera¬ 
tions  applied  element-wise,  for  example  C(o<T,  Ik,  Sk+ 1,  ■■■ ,  In-\,  Sn)UC(o<T,  I'k,  S'k+1, ... ,  I'n_1,S'n)= 

ik  u  rk,  sk+i  u  sk+1,  i  u  sn  u  s'n ). 
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2.3.3  Structured  Stochastic  Discrete  Event  Systems 

So  far,  we  have  defined  stochastic  discrete  event  systems  in  rather  general  terms  as  any  stochastic  process 
with  piecewise  constant  trajectories.  Most  stochastic  discrete  event  systems  of  interest  have  more  structure 
than  that.  Any  additional  structure  simplifies  the  specification  of  a  stochastic  discrete  event  system  and  can 
often  be  exploited  in  the  analysis  of  such  systems. 

The  probability  measure  on  sets  of  trajectories  for  a  stochastic  discrete  event  system  can  be  expressed 
using  a  holding  time  distribution  with  probability  density  function  /i(-;a<T)  and  a  next-state  distribution 
pf;  a<T,  t).  The  probability  measure  for  C(a<T.  /&,  S^+i,  ■  ■  ■ ,  In- 1,  Sn)  can  now  be  defined  recursively  as 

(2.19)  n(C(a<T,Ik,Sk+i,...,In-i,Sn))  = 

/  h(tk  +  t;  <7<t)  I  p(s,  (7<r ,  f)/r(C( CT<t  0  (t,  s) ,  Ik+ 1 )  <Sfc+2)  ■  ■  ■  i 1n—  1 >  ) )  j 

Jik  Is 

where  {(sq,  to), . . . ,  (sk,tk)}  ©  (t,  s)  =  {(-so,  to),  ■  ■  ■ ,  {sk,tk  +  t),  (s,  0)}.  The  base  case  for  the  recursive 
definition  is  p,(C(a<T))  =  1.  This  is  a. factored  representation  of  the  probability  measure  //. 

In  addition  to  structure  in  the  probability  measure  on  sets  of  trajectories,  we  can  also  have  structure  in  the 
state  space.  Instead  of  a  flat  state  representation,  it  is  often  natural  to  describe  the  state  of  a  system  by  using 
multiple  state  variables  which  leads  to  a  factored  state  space.  A  factored  representation  of  the  state  space 
S  of  a  measurable  stochastic  discrete  event  system  is  a  set  of  state  variables  SV  and  a  value  assignment 
function  E(s,  x)  providing  the  value  of  x  6  .S'  V'  in  state  s.  The  domain  of  x  is  the  set  Dx  —  Uses  V(s,x) 
of  possible  values  that  x  can  take  on.  A  tuple  (S,  T,  p,  SV,  V)  represents  a  measurable  stochastic  discrete 
event  system  with  a  factored  state  space.  Note  that  |Sj  is  at  most  IXeSV  |Ar|>  which  is  exponential  in 
the  number  of  state  variables,  but  the  actual  size  of  S  can  of  course  be  smaller  than  fj  c^sv  |Ar|  if  certain 
combinations  of  variable  assignments  do  not  correspond  to  an  actual  state  s  6  S. 

We  will  now  discuss  a  few  common  models  of  stochastic  discrete  event  systems  with  specific  struc¬ 
tural  properties.  By  making  limiting  assumptions  regarding  the  shape  of  the  probability  density  functions 
hf;  a<T)  and  p(-:  o>r,  t),  we  enable  a  succinct  representation  of  //.  This  is  important  for  efficient  generation 
of  sample  trajectories  for  stochastic  discrete  event  systems,  which  is  a  large  component  of  our  statistical 
model  checking  algorithm.  We  include  a  brief  description  of  Markov  and  semi-Markov  processes.  More 
detailed  accounts  on  this  topic  arc  provided  by,  for  example,  Kolmogoroff  (1931),  Doob  (1953),  Bartlett 
(1966),  Howard  (1971a,  1971b),  and  ginlar  (1975). 
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Markov  Processes 

A  stochastic  discrete  event  system  is  a  time  homogeneous  Markov  process  if  the  future  behavior  at  any  point 
in  time  depends  only  on  the  state  at  that  point  in  time,  and  not  in  any  way  on  how  that  state  was  reached. 
This  implies  that  the  probability  measure  on  sets  of  trajectories  satisfies  the  following  property: 

(2.20)  p(Path({(s0,t0),...,(sk,tk)}))  =  p(Path({(sk,  0)})) 

Equation  2.20  is  known  as  the  Markov  property ,  named  after  the  Russian  mathematician  A.  A.  Markov 
who  in  the  early  1900's  systematically  studied  discrete-time  stochastic  processes  satisfying  this  property. 
A  Markov  process  is  time  inhomogeneous  if  the  distribution  over  future  trajectories  depends  on  the  time  of 
observation,  in  addition  to  the  current  state. 

For  a  factored  representation  of  p,  condition  (2.20)  holds  if  and  only  if  h(tk  +  f;  er<T)  =  h(t:  Sk)  and 
p(-; <r<T,  t )  =  p(-:  Sk)  for  all  trajectory  prefixes  <j<T  =  {(sq.  to), . . . ,  (sk,  t/,-)}.  The  first  condition  implies 
that  /if-:  a<T)  is  a  memory  less  distribution.  Thus,  a  discrete-time  Markov  process  has  geometric  holding 
time  distributions  for  each  state,  so  the  probability  of  remaining  in  state  s  for  t  more  time  units  before  a  state 
transition  occurs  is  h(t;  s )  =  qs ( 1  —  qsY~  1  for  some  qs  £  [0, 1],  The  dynamics  of  a  discrete-time  Markov 
process  with  state  space  S  is  fully  specified  with  qs  and  p(-\  s)  for  each  state  s  £  S.  If  S  is  countable,  then 
the  dynamics  is  captured  by  a  state  transition  probability  matrix  P  with  elements 

j  1  -  QiO-  if  i  =  j 

[  Tp(j]  i)  if  *7 -3 

where  Pij  is  the  probability  that  the  discrete-time  Markov  process  occupies  state  j  at  time  t  +  l  when  the 
process  occupies  state  i  at  time  t. 

Example  2.6.  Consider  a  simple  queuing  system,  and  let  Sj,  i  >  0,  denote  the  state  with  i  items  in  the 
queue.  Assume  that  the  holding  time  in  so  is  geometrically  distributed  with  parameter  qo  =  ^  and  the 
holding  time  in  all  other  states  is  geometrically  distributed  with  parameter  q%  =  |.  The  expected  holding 
time  in  .s(,  is  greater  than  in  the  other  states  because  no  departures  can  occur  in  sq-  Furthermore,  assume  that 
a  state  transition  in  Sj,  for  i  >  0,  is  caused  by  a  departure  with  probability  |  and  an  arrival  with  probability 
f .  The  resulting  discrete-time  Markov  process  is  depicted  in  Figure  2.15. 
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Figure  2.15:  A  discrete-time  Markov  process  represent¬ 
ing  a  queuing  system.  The  arcs  are  labeled  with  the  en¬ 
tries  of  the  state  transition  probability  matrix  for  the  pro¬ 
cess. 


Figure  2.16:  A  continuous-time  Markov  process  repre¬ 
senting  a  queuing  system.  The  arcs  are  labeled  with  the 
entries  of  the  infinitesimal  generator  matrix  for  the  pro¬ 
cess. 


For  continuous-time  Markov  processes,  the  holding  time  in  state  s  is  exponentially  distributed:  h(t ;  s)  = 
Xse~Xat.  The  parameter  As  is  the  exit  rate  for  state  s.  The  probability  that  a  state  transition  occurs  in  the 
next  t!  time  units  is  jjj  Xse~Xst  dt  =  1  —  e~Xst' .  The  dynamics  of  a  continuous-time  Markov  process  with 
countable  state  space  can  be  fully  characterized  by  a  matrix  Q  with  elements 


Qij 


-Xi(l  -  if  i  =  j 

Xip(j-i)  if  i^j 


The  matrix  0  is  typically  referred  to  as  the  infinitesimal  generator  of  a  continuous-time  Markov  process 
(Puterman  1994,  p.  561). 


Example  2.7.  Consider  a  queuing  system  similar  to  that  in  Example  2.6,  but  with  time  as  a  continuous 
quantity.  The  holding  time  for  is  exponentially  distributed  with  rate  4  for  i  =  0  and  -jj-  for  i  >0.  In  .s,;, 
for  i  >  0,  a  state  transition  is  caused  by  a  departure  with  probability  |  and  by  an  arrival  with  probability  |. 
Figure  2.16  shows  the  resulting  continuous-time  Markov  process. 


As  was  mentioned  earlier,  it  is  common  to  assume  time  divergence  for  infinite  trajectories  of  stochastic 
discrete  event  systems,  i.e.  that  the  system  is  non-explosive.  Obviously,  any  discrete-time  Markov  process  is 
non-explosive  because  there  is  always  at  least  a  unit  delay  between  state  transitions.  It  can  be  shown  that  a 
sufficient  condition  for  a  continuous-time  Markov  process  to  be  non-explosive  is  that  there  exists  a  constant 
c  such  that  As  <  c  for  all  .s  6  S  (cf.  Baier  et  al.  2003,  Prop.  1).  As  a  direct  consequence,  all  finite-state  time 
homogeneous  Markov  processes  are  non-explosive.  Not  all  infinite-state  continuous-time  Markov  processes 
are  non-explosive,  however,  as  the  following  example  illustrates. 


Example  2.8.  Consider  the  continuous-time  Markov  process  depicted  in  Figure  2.17,  with  an  infinite  state 
space  S  =  {.S'o,  s  i , . . .  }  and  exit  rates  A,;  =  22ll+ 1  ) .  The  exit  rates  for  this  Markov  process  rapidly  increase 
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Figure  2.17:  An  explosive  continuous-time  Markov  process. 


with  each  state  transition.  Any  trajectory  with  a  holding  time  in  the  interval  (0,  T  ('+hj  in  state  sr.  for  each 
i  >  0,  is  convergent  because  the  total  time  never  exceeds  1.  The  probability  measure  for  the  set  of  all  such 
trajectories  is 

OO  OO 

no -«-’■)  • 

2=0  2=1 

This  infinite  product  converges  to  a  value  approximately  equal  to  0.849.  Thus,  the  probability  measure  of 
the  set  of  convergent  trajectories  is  non-zero,  which  means  that  the  Markov  process  is  explosive. 

In  order  to  simulate  the  execution  of  a  Markov  process,  we  need  to  be  able  to  sample  from  the  next- 
state  distribution  of  any  state.  If  we  arc  to  simulate  execution  for  an  extended  period  of  time,  we  need  a 
long  sequence  of  pseudorandom  numbers.  Unless  we  arc  careful  in  our  choice  of  pseudorandom  number 
generator,  subtle  correlations  in  pseudorandom  number  sequences  may  be  a  source  of  systematic  error  in  the 
analysis  of  the  simulation  output  (Ferrenberg  et  al.  1992).  The  Mersenne  Twister  (Matsumoto  and  Nishimura 
1998),  with  its  exceptionally  long  period,  is  thought  to  be  a  suitable  pseudorandom  number  generator  for 
simulation  studies  of  stochastic  processes. 

Semi-Markov  Processes 

Not  all  phenomena  in  nature  arc  accurately  captured  by  memoryless  distributions.  The  lifetime  of  a  system 
component,  for  example,  is  often  best  modeled  using  a  Weibull  distribution  (Nelson  1985).  The  Weibull 
distribution  can  be  used  to  model  increasing  failure  rates,  for  example  representing  increasing  likelihood  of 
failure  due  to  wear,  as  well  as  decreasing  failure  rates. 

A  semi-Markov  process  is  a  stochastic  process  for  which  in  order  to  accurately  predict  future  behavior 
one  may  need  to  know  not  only  the  current  state  but  also  the  amount  of  time  spent  in  that  state  (although 
it  is  still  inconsequential  how  the  current  state  was  reached).  We  can  state  the  semi-Markov  property  as  a 
constraint  on  the  probability  measure  p: 


(2.21) 


p(Path({(s0,to),  ■  ■  ■ ,  ( Sk,tk )}))  =  p,{Path({(sk,tk)})) 
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The  probability  measure  over  sets  of  trajectories  for  a  semi-Markov  process  can  be  represented  by  a 
holding  time  distribution  /?.(-;  Sk,t-k )  and  a  next-state  distribution  p(-\  s^,  t).  The  probability  of  transitioning 
out  of  state  sj-  within  t  time  units,  provided  that  we  have  already  been  in  S}.  for  time  units,  is  given  by 
/o  h(tk  +  x]  Sk,  tk)-  Given  that  a  state  transition  occurs  after  t  time  units  in  state  Sk,  the  probability  that  the 
next  state  belongs  to  the  set  S'  is  fs,  p(s Sk,t). 

Example  2.9.  Consider  a  computer  system  that  can  be  in  one  of  two  states:  running  or  crashed.  The  uptime 
is  modeled  by  a  standard  Weibull  distribution  with  shape  parameter  1.5,  denoted  W (1, 1.5).  This  means  that 
the  likelihood  of  a  crash  increases  with  time.  When  crashed,  the  system  can  be  brought  back  to  the  running 
state  through  a  reboot.  The  reboot  time  is  uniformly  distributed  in  the  interval  (1,2).  This  computer  system 
is  a  semi- Markov  process  because  the  holding  time  distributions  arc  not  memoryless. 

To  simulate  execution  of  a  semi-Markov  process,  we  need  to  be  able  to  generate  non-uniform  pseudo¬ 
random  numbers.  Typical  pseudorandom  number  generators  produce  observations  for  a  random  variable 
U  with  a  uniform  distribution  17(0,1).  We  can  transform  these  observations  into  pseudorandom  numbers 
distributed  according  to  an  arbitrary  distribution  function  F(x).  The  random  variable  X  =  _F_1(1 7),  where 

1  is  the  inverse  of  F,  has  distribution  function  F(x),  so  an  observation  u  of  17  can  be  transformed  into 
an  observation  x  =  F~  1  (u)  of  X  (von  Neumann  1951).  For  example,  the  exponential  distribution  has 
cumulative  distribution  function  F(x)  =  1  —  e-Ax,  so  we  can  use  x  =  —  log(l  -  «)/A  as  a  sample  from 
the  exponential  distribution.  The  inverse  method  works  well  for  many  common  probability  distributions  for 
which  the  inverse  of  F (x)  can  be  computed  efficiently.  Various  other  methods  for  generating  non-uniform 
pseudorandom  numbers  arc  described  by  Devroye  (1986). 

Generalized  Semi-Markov  Processes 

Both  Markov  and  semi-Markov  processes  can  be  used  to  model  a  wide  variety  of  stochastic  discrete  event 
systems,  but  without  emphasis  on  the  event  structure.  The  queuing  systems  in  Examples  2.6  and  2.7  arc 
naturally  described  as  having  arrival  and  departure  events,  although  the  Markov  processes  we  use  to  model 
the  systems  represent  only  the  joint  effects  of  all  events  enabled  in  a  state.  The  generalized  semi-Markov 
process  (GSMP),  first  introduced  by  Matthes  (1962),  is  an  established  formalism  in  queuing  theory  for 
modeling  stochastic  discrete  event  systems  with  focus  on  the  event  structure  of  a  system  (Glynn  1989). 
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A  GSMP  consists  of  a  set  of  states  S  and  a  set  of  events  E.  At  any  time,  the  process  occupies  some 
state  s  G  Sin  which  a  subset  Es  of  the  events  are  enabled.  Associated  with  each  event  e  £  E  is  a  positive 
trigger  time  distribution  Ge,  and  a  next-state  distribution  pe(-;a<T,t)-  The  probability  density  function 
for  Ge,  he  ( ■ :  <r<r),  can  depend  on  the  entire  execution  history,  which  separates  GSMPs  from  semi-Markov 
processes.  Let  Te  be  a  random  variable  representing  the  trigger  time  of  e.  If  e  just  became  enabled,  then 
Pr[Te  <  t  |  <7<r]  =  He(t;a<T )  =  he(x;a<T)  dx  is  the  probability  that  e  triggers  within  t  time  units, 
provided  that  e  remains  continuously  enabled.  If  e  has  already  been  enabled  for  ue  time  units,  then  the 
probability  of  e  triggering  in  the  next  t  time  units  is 

(2.22)  Pr[Te  <t  +  ue\Te>  ue,a<T]  =  l-Pr[Te  >  t.  +  ue\Te>  ue,a<T ]  =  1-  1  +  U(:,(T-t^  . 

1  -  He{ue;  <7<t) 

By  taking  the  derivative  of  (2.22)  we  get 

(2.23)  he(t,  ue:  ct<7-)  -  t  rhe(t  +  rxe,rx<r)  , 

1  -He(ue;a<r) 

which  is  the  conditional  probability  density  function  for  the  distribution  Ge.  The  enabled  events  in  a  state 
race  to  trigger  first,  and  the  event  that  triggers  causes  a  transition  to  a  state  s'  6  S  according  to  the  next-state 
distribution  for  the  triggering  event. 

Example  2.10.  Consider  a  queuing  system  with  infinite  capacity,  and  a  state  of  this  system  is  simply  the 
number  of  items  currently  in  the  queue.  There  is  an  arrival  event  a,  enabled  in  every  state,  that  has  an 
exponential  trigger  time  distribution  with  rate  l.  There  is  also  a  departure  event  d  that  is  enabled  in  states 
Si  for  i  >  0.  This  event  has  an  exponential  trigger  time  distribution  with  rate  jj.  This  queuing  system  is  a 
GSMP  with  state  space  S  =  Z*  and  event  set  E  =  {a,  d}.  Furthermore,  we  have  Eq  =  {a},  £)  =  E  for 
i  >  0,  ha(t)  =  pa(i  +  1;  i)  =  1  for  all  i  €  S,  h^(t)  =  \e~t^2 ,  and  p^(i  —  1;  i)  =  1  for  all  i  >  0. 

For  many  stochastic  discrete  event  systems,  the  trigger  time  and  next-state  distributions  do  not  depend 
on  every  aspect  of  an  entire  trajectory  prefix,  as  is  clearly  the  case  in  Example  2. 10.  Let  ue,  for  each  e  G  E, 
represent  the  time  that  e  has  been  continuously  enabled  without  triggering.  If  he{-\c<T)  =  sj..  ue) 
and  pe(-;a<T,  t)  =  pe( •;  s*.),  for  all  e  G  E.  then  we  have  a  time  homogeneous  GSMP  (Glynn  1989,  p.  18). 
A  time  homogeneous  GSMP  where  all  events  have  an  exponential  trigger  time  distribution  with  rate  Ae 
is  also  a  time  homogeneous  Markov  process.  The  holding  time  distribution  for  state  s  is  an  exponential 
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distribution  with  rate  As  =  YleeEa  and  ^1C  transition  probabilities  arc  pis':  s )  =  YleeEs  Peis']  s)Ae/As. 
The  stochastic  discrete  event  systems  in  Examples  2.7  and  2.10  arc  in  fact  equivalent. 

To  simulate  the  execution  of  a  time  homogeneous  GSMP  model,  we  associate  a  real-valued  clock  te 
with  each  event  that  indicates  the  time  remaining  until  e  is  scheduled  to  trigger  in  the  current  state.  The 
process  starts  in  some  initial  state  s  with  events  Es  enabled.  For  each  enabled  event  e  €  Es,  we  sample  a 
trigger  time  according  to  Ge  and  set  te  to  the  sampled  value.  For  disabled  events,  we  set  te  =  oo.  Let  e* 
be  the  event  in  Es  with  the  smallest  clock  value.  This  becomes  the  triggering  event  in  s.  Provided  that  at 
most  one  of  the  time  distributions  is  not  continuous,  the  probability  of  two  events  triggering  at  exactly  the 
same  time  is  zero  so  e*  is  uniquely  defined  (Glynn  1989,  p.  17).  When  e*  triggers  after  te *  time  units  in  s, 
we  sample  a  successor  state  s'  according  to  pe*  (• ;  { (s,  0) } ,  te* )  and  update  each  clock  te  as  follows: 

1.  if  e  G  Es!  (~l  ({e*}  U  {E  \  i7s)),  then  t’e  is  sampled  from  Ge; 

2.  if  e  €  Esi  fl  [Es  \  {e*}),  then  t'e  =  te  —  te*\ 

3.  otherwise,  if  e  0  Es>  then  t'e  =  oo. 

The  first  rule  covers  events  that  are  enabled  in  s'  and  either  triggered  or  were  not  enabled  in  s.  All  such  events 
arc  rescheduled.  Events  that  remain  enabled  across  state  transitions  without  triggering  arc  not  rescheduled 
(rule  2).  The  final  rule  states  that  events  disabled  in  s'  arc  scheduled  not  to  trigger.  Given  a  new  state  s'  and 
new  clock  values  t'e  for  each  e  £  77,  we  repeat  the  procedure  just  specified  with  s  =  s'  and  te  =  t'e  so  long  as 
Eg  /  0.  Enabled  events,  annotated  by  a  scheduled  trigger  time,  can  be  stored  in  a  heap  to  accommodate  fast 
retrieval  of  e*  (Gonnet  1976).  McCormack  and  Sargent  (1981)  compare  various  data  structures  for  storing 
event  schedules.  Discrete  event  simulation  is  further  discussed  by  Bratley  et  al.  (1987)  and  Shedler  (1993). 

2.4  Stochastic  Decision  Processes 

So  far,  we  have  discussed  stochastic  processes  with  a  fixed  structure.  Now,  let  us  consider  the  case  when  a 
decision  maker  can  influence  the  structure  and  dynamics  of  the  process,  to  some  degree,  and  wants  to  select 
a  structure  that  achieves  some  design  objective.  We  then  have  a  stochastic  decision  process. 

The  most  widely  adopted  stochastic  decision  process  is  the  Markov  decision  process  (MDP;  Bellman 
1957;  Howard  1960,  1971b;  Puterman  1994;  Boutilier  et  al.  1999).  The  dynamics  of  a  discrete-time  Markov 
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process  is  captured  by  a  transition  probability  matrix  P.  For  an  MDP,  there  arc  multiple  transition  probability 
matrices  that  a  decision  maker  may  choose  from  at  each  stage  during  execution.  Each  choice  corresponds 
to  an  action  on  behalf  of  the  decision  maker.  A  transition  probability  matrix  P"  represents  the  behavior  of 
the  system  in  the  next  time  step  if  action  a  is  chosen  by  the  decision  maker.  For  continuous-time  MDPs,  an 
action  is  instead  represented  by  a  infinitesimal  generator  matrix  Qa. 

The  decision  maker  designs  a  policy,  denoted  n,  which  is  a  mapping  from  situations  to  actions.5  A 
situation  can  constitute  the  entire  execution  history  {history  dependent  policy ),  the  current  time  and  state 
{time  dependent  policy),  or  just  the  current  state  {stationary  policy).  A  policy  may  designate  a  fixed  action 
to  be  used  in  a  situation  {deterministic  policy),  or  a  distribution  over  actions  {randomized  policy).  The  policy 
fixes  the  dynamics  of  a  system,  and  an  MDP  coupled  with  a  policy  is  a  Markov  process.  For  example,  given 
a  stationary  randomized  policy  tt  and  a  set  of  actions  A ,  the  probability  of  transitioning  from  state  i  to  j  in 
the  next  time  step  is  JA  P"  dn(i). 

Rewards  and  costs  (negative  rewards)  arc  used  to  encode  perceived  value  for  a  decision  maker.  Different 
reward  structures  can  be  used,  but  it  is  common  to  associate  rewards  with  state  transitions.  For  example,  a 
transition  from  s  to  s'  earns  the  decision  maker  an  immediate  reward  k(s,  s').  The  transition  rewards  can 
depend  on  the  action  that  is  chosen  for  a  state.  For  continuous-time  models,  it  is  also  common  to  earn  reward 
at  some  rate  c(s)  for  the  duration  of  time  that  state  s  is  occupied. 

A  decision  maker  chooses  a  policy  according  to  some  optimality  criterion.  The  objective  is  generally  to 
maximize  the  expected  reward  accumulated  during  execution,  but  this  can  be  given  different  interpretations. 
Possibly  the  most  straightforward  interpretation  is  to  maximize  the  expected  total  reward.  This  can  be 
unbounded,  however,  if  execution  can  proceed  ad  infinitum.  To  ensure  that  a  bound  exists,  we  can  halt 
execution  after  a  fixed  time  bound  {finite-horizon  total  reward)  or  discount  reward  earned  t  time  units  into 
the  future  by  a  factor  C  {infinite-horizon  discounted  reward).  Other  optimization  criteria  exist  as  well  (cf. 
Puterman  1994). 

Depending  on  the  optimality  criterion,  it  may  not  be  necessary  to  consider  the  most  general  class  of 
policies  in  order  to  act  optimally.  For  example,  to  find  a  policy  that  maximizes  the  infinite-horizon  dis¬ 
counted  reward  for  an  MDP,  it  is  sufficient  to  consider  the  class  of  deterministic  stationary  policies.  The 

5In  the  model  checking  literature,  a  policy  is  called  a  schedule.  Model  checking  for  MDPs  involves  verifying  that  a  property 
holds  for  a  certain  class  of  schedulers  (cf.  Bianco  and  de  Alfaro  1995). 
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infinite-horizon  discounted  reward  in  state  i  of  a  discrete-time  MDP  controlled  by  a  deterministic  stationary 
policy  7T  is  given  by  the  recurrence  relation 

«*(*)  =  r*(i)  ( i )  +7  PB{l)  '  ’ 

ies 

where  fn(i)  =  is  the  expected  transition  reward  in  state  i  for  action  a  (Howard  1960, 

p.  77).  The  optimal  value  is  obtained  by  maximizing  over  the  set  of  actions: 

(2.24)  v*  (i)  =  max  (  ra (i)  +  7  ^  ■  v*  ( j ) 

V  jeS 

This  equation  forms  the  basis  for  value  iteration,  which  is  a  dynamic  programming  (Bellman  1957)  tech¬ 
nique  for  finding  the  optimal  policy  of  an  MDP.  An  alternative  solution  method  is  policy  iteration  (Howard 
1960),  which  often  requires  fewer  iterations  than  value  iteration  to  converge,  but  with  a  higher  cost  per 
iteration.  A  middle  ground  is  provided  by  Puterman  and  Shin’s  (1978)  modified  policy  iteration. 

Howard  (1960)  shows  that  the  continuous-time  MDP  with  discounting  is  computationally  equivalent  to 
its  discrete-time  counterpart,  and  describes  how  a  continuous-time  MDP  can  be  transformed  into  a  discrete¬ 
time  MDP  using  a  technique  analogous  to  uniformization  (Jensen  1953)  for  Markov  processes.  The  equiva¬ 
lence  between  continuous  and  discrete-time  MDPs  is  further  explored  by  Lippman  (1975),  and  generalized 
to  countable  state  spaces  by  Serfozo  (1979).  Uniformization  is  a  technique  by  which  a  continuous-time 
MDP  with  state-dependent  exit  rates  can  be  transformed  into  an  equivalent  continuous-time  MDP  with  the 
same  (uniform)  exit  rate  for  all  states.  The  uniform  continuous-time  MDP  can  then  be  treated  as  a  discrete¬ 
time  MDP  resulting  from  observing  the  original  continuous-time  MDP  at  a  constant  rate.  Uniformization 
introduces  self-transitions  not  present  in  the  original  model,  because  it  is  possible  to  remain  in  the  same  state 
from  one  observation  to  another.  Puterman  (1994)  presents  uniformization  as  the  preferred  method  for  solv¬ 
ing  continuous-time  MDPs,  but  we  show  in  Chapter  9  that  it  can  be  more  efficient  to  solve  a  continuous-time 
MDP  directly,  without  first  transforming  it  into  a  discrete-time  MDP. 

The  semi-Markov  decision  process  (SMDP;  Howard  1963,  1971b),  a  decision  theoretic  extension  of  the 
semi-Markov  process,  permits  time  between  state  transitions  to  be  governed  by  a  general  positive  distribu¬ 
tion.  Chitgopekar  (1969),  Stone  (1973),  Cantaluppi  (1984)  consider  generalizations  of  the  SMDP  model 
where  the  action  choice  is  allowed  to  change  not  only  at  the  time  of  state  transitions,  but  also  at  time  points 
between  state  transitions.  Chapter  9  discusses  this  issue  further. 


Chapter  3 


Related  Work 


This  chapter  discusses  related  research  in  the  model  checking,  operations  research,  and  AI  planning  liter¬ 
ature.  We  focus  primarily  on  research  dealing  with  probabilistic  systems,  although  in  the  case  of  planning 
we  also  mention  efforts  involving  nondeterministic  systems.  We  do  not  attempt  to  produce  an  exhaustive 
account  of  all  past  research  concerning  probabilistic  verification  and  planning  under  uncertainty,  as  it  would 
be  a  daunting  task.  The  research  efforts  mentioned  in  this  chapter  should  instead  be  thought  of  as  a  repre¬ 
sentative  sample  of  all  related  work. 

3.1  Probabilistic  Verification 

Early  work  on  probabilistic  verification  has  a  clear  focus  on  discrete  time  models,  with  the  verification  of 
randomized  algorithms  as  the  primary  application  in  mind.  Hart  et  al.  (1983)  analyze  termination  of  concur¬ 
rent  probabilistic  programs.  Lehmann  and  Shelah  (1982)  and  Hart  and  Sharir  (1984)  introduce  probabilistic 
temporal  logics  for  specifying  properties  of  probabilistic  programs.  These  logics  can  only  express  proper¬ 
ties  that  either  hold  with  probability  one  or  with  non-zero  probability,  so  a  verification  algorithm  can  ignore 
the  actual  probabilities  in  a  model.  In  contrast,  the  logic  of  Reif  (1980)  permits  properties  with  rational 
probability  thresholds  other  than  zero  and  one.  Automated  verification  as  model  checking  was  pioneered 
by  Clarke  and  Emerson  (1982).  Vardi  (1985)  and  Courcoubetis  and  Yannakakis  (1995)  describe  model 
checking  algorithms  for  linear  temporal  logic,  LTL,  when  the  model  is  a  probabilistic  program  and  the  LTL 
formula  is  required  to  hold  with  probability  one. 
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Hansson  and  Jonsson  (1989,  1994)  present  the  probabilistic  real-time  computation  tree  logic,  PCTL, 
based  on  CTL  (Clarke  and  Emerson  1982;  Clarke  et  al.  1986)  but  with  the  path  quantifiers  “for  all  trajecto¬ 
ries”  (V)  and  “there  exists  a  trajectory”  (3)  replaced  by  a  single  probabilistic  path  quantifier  with  a  probability 
threshold  not  restricted  to  the  values  zero  and  one.  In  PCTL,  one  can  also  associate  a  time  bound  with  a 
path  operator,  such  as  “until”,  enabling  one  to  impose  deadlines  for  reaching  certain  states.  PCTL  formulae 
arc  interpreted  over  discrete  time  Markov  processes,  and  each  state  transition  corresponds  to  a  time  unit. 
Hansson  and  Jonsson  provide  algorithms  for  PCTL  model  checking  with  finite-state  models.  In  the  general 
case,  with  a  finite  time  bound  and  a  probability  threshold  in  the  interval  (0, 1),  PCTL  model  checking  can 
be  solved  numerically  in  either  0{t  •  (\S\  +  \E\))  or  0(log(f)  •  |<S)3)  time,  where  t  is  the  time  bound,  ,3’  is 
the  size  of  the  state  space,  and  \E\  is  the  number  of  state  transitions  with  non-zero  probability  in  the  Markov 
process.  Since  \E\  is  at  most  |Sj2,  and  typically  no  less  than  ,S),  PCTL  model  checking  is  polynomial  in 
the  size  of  the  state  space.  To  handle  large  state  spaces,  Baier  et  al.  (1997)  propose  using  multi-terminal 
binary  decision  diagrams,  MTBDDs  (Clarke  et  al.  1993;  Bahai-  et  al.  1993;  Fujita  et  al.  1997),  to  carry  out 
the  numerical  computations. 

Aziz  et  al.  (1996,  2000)  propose  the  logic  CSL,  the  continuous  stochastic  logic,  as  a  variation  of  PCTL 
for  expressing  properties  of  continuous-time  Markov  processes.  They  prove  that  CSL  model  checking  is 
decidable  for  rational  time  bounds,  but  do  not  provide  a  practical  model  checking  algorithm.  Baier  et  al. 
(1999)  present  a  numerical  model  checking  algorithm,  using  MTBDDs,  for  a  variation  of  CSL  with  the 
addition  of  a  steady-state  operator.  Model  checking  of  time -bounded  CSL  formulae  amounts  to  solving  a 
system  of  Volterra  integral  equations,  but  solving  this  equation  system  is  time  consuming  and  numerical 
stability  is  hard  to  achieve  (Hermanns  et  al.  2000).  A  better  solution  method  is  provided  by  Baier  et  al. 
(2000),  who  show  that  CSL  model  checking  of  time -bounded  formulae  can  be  reduced  to  transient  analysis 
of  continuous-time  Markov  processes  and  suggest  the  use  of  sparse  matrices  instead  of  MTBDDs.  The 
former  means  that  time -bounded  CSL  properties  can  be  verified  using  existing  techniques  for  transient 
analysis,  in  particular  uniformization1  (Jensen  1953),  which  have  been  used  extensively  in  the  performance 
evaluation  literature  (Grassmann  1977;  Gross  and  Miller  1984;  Reibman  and  Trivedi  1988;  Malhotra  et  al. 
1994).  While  Baier  et  al.  suggest  that  uniformization  should  be  applied  to  each  individual  state  separately, 
resulting  in  a  time  complexity  of  0(q  ■  t  ■  |Sj  ■  \E\  )  for  time -bounded  CSL  formulae  (q  is  the  uniformization 

'Other  names  for  this  technique  are  randomization  and  Jensen’s  method. 
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constant  and  can  be  set  to  the  maximum  exit  rate  for  the  model),  Katoen  et  al.  (2001)  improve  the  time 
complexity  by  a  factor  (9(|,S’|)  by  noting  that  uniformization  can  be  performed  for  all  states  simultaneously. 
These  contributions  arc  summarized  by  Baier  et  al.  (2003). 

While  MTBDDs  often  can  represent  the  transition  matrix  of  a  Markov  process  in  a  compact  manner, 
they  arc  not  always  an  efficient  representation  for  numerical  computation.  Kwiatkowska  et  al.  (2002b, 
2004)  explore  different  representations,  including  a  hybrid  approach  that  combines  the  MTBDD  represen¬ 
tation  of  transition  matrices  with  a  flat  representation  of  iteration  vectors.  This  hybrid  approach  is  generally 
faster  than  MTBDDs,  while  handling  larger  systems  than  sparse  matrices.  Another  promising  approach  is 
presented  by  Buchholz  et  al.  (2003),  who  use  Kronecker  products  to  exploit  structure  in  the  models. 

Infante  Lopez  et  al.  (2001)  go  beyond  Markov  models  by  considering  the  CSL  model  checking  problem 
for  semi-Markov  processes.  For  CSL  formulae  without  a  time  bound,  the  problem  reduces  to  probabilis¬ 
tic  model  checking  for  discrete-time  Markov  processes.  For  time-bounded  formulae,  the  model  check¬ 
ing  problem  amounts  to  solving  a  system  of  Volterra  integral  equations.  A  different  approach  is  taken  by 
Kwiatkowska  et  al.  (2002a).  Time  bounds  in  CSL  are  typically  specified  as  intervals  of  real  numbers,  but 
Kwiatkowska  et  al.  associate  positive  probability  distributions  with  time  bounds  and  suggest  that  this  can  be 
used  to  express  certain  properties  of  systems  with  general  distributions  while  still  using  Markov  models  of 
the  systems.  Alur  et  al.  (1991)  describe  a  model  checking  algorithm  for  generalized  semi-Markov  processes, 
but  only  for  probability  thresholds  zero  or  one  and  restricted  to  trigger  time  distributions  with  finite  support. 
Kwiatkowska  et  al.  (2000)  use  a  similar  approach  for  probabilistic  timed  automata,  and  permit  arbitrary 
probability  thresholds. 

Even  with  the  use  of  clever  data  structures,  numerical  solution  techniques  tend  to  suffer  greatly  from 
the  state  space  explosion  problem.  The  statistical  approach  presented  in  Chapter  5  is  an  attempt  to  over¬ 
come  the  limitations  of  numerical  solution  techniques  for  large  state  spaces,  while  providing  only  statistical 
correctness  guarantees.  Lassaigne  and  Peyronnet  (2002)  propose  a  statistical  approach  for  model  checking 
a  fragment  of  LTL.  They  do  not  formulate  a  hypothesis  testing  problem,  but  instead  rely  on  less  efficient 
techniques  for  statistical  estimation.  In  probabilistic  model  checking,  the  question  is  whether  a  probability 
is  above  or  below  some  threshold,  and  it  would  typically  be  a  waste  of  effort  to  obtain  an  accurate  esti¬ 
mate  of  a  probability  only  to  realize  that  it  is  far  from  the  specified  threshold.  Grosu  and  Smolka  (2004) 
present  a  Monte  Carlo  approach  to  LTL  model  checking  for  non-probabilistic  systems,  but  take  the  same 
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approach  as  Lassaigne  and  Peyronnet  by  relying  on  statistical  estimation  rather  than  hypothesis  testing.  In 
this  case,  it  makes  even  less  sense  to  use  estimation  techniques  because  the  estimated  probability  has  no 
clear  meaning — there  arc  no  probabilities  in  the  model.  Sen  et  al.  (2004)  describe  a  statistical  approach, 
based  on  hypothesis  testing,  for  verifying  probabilistic  systems.  They  assume  that  the  system  has  already 
been  deployed  so  that  execution  traces  cannot  be  generated  on  demand.  We  discuss  their  approach  in  further 
detail  in  Chapter  7,  where  we  expose  some  serious  flaws  in  their  proposed  solution  method. 


3.2  Planning  under  Uncertainty 

Current  approaches  to  planning  under  uncertainty  can  be  divided  roughly  into  distinct  categories  based  on 
their  representation  of  uncertainty,  how  goals  are  specified,  the  model  of  time  used,  and  assumptions  made 
regarding  observability.  Two  prevalent  representations  of  uncertainty  arc  nondeterministic  and  stochastic 
models.  In  nondeterministic  models,  uncertainty  is  represented  strictly  logically,  using  disjunction,  while  in 
stochastic  models  uncertainty  is  specified  with  probability  distributions  over  the  possible  outcomes  of  events 
and  actions. 

The  objective  when  planning  with  nondeterministic  models  is  often,  although  not  always,  to  generate 
a  universal  plan  (Schoppers  1987)  that  is  guaranteed  to  achieve  a  specified  goal  regardless  of  the  actual 
outcomes  of  events  and  actions.  A  goal  can  be  a  set  of  desirable  states,  as  in  the  work  of  Cimatti  et  al. 
(1998)  and  Jensen  and  Veloso  (2000),  or  a  modal  temporal  logic  formula  as  proposed  by  Kabanza  et  al. 
(1997)  and  Pistore  and  Traverso  (2001).  Conditional  planners,  such  as  CNLP  (Peot  and  Smith  1992)  and 
Plinth  (Goldman  and  Boddy  1994a),  are  also  examples  of  planners  for  nondeterministic  domains. 

Ginsberg  (1989)  questions  the  practical  value  of  universal  nondeterministic  planning.  His  main  concern 
is  that  the  representation  of  a  universal  plan  is  bound  to  be  infeasibly  large  for  interesting  problems.  It  is 
impractical,  Ginsberg  argues,  for  an  agent  to  precompute  its  response  to  every  situation  in  which  it  might  find 
itself,  simply  because  the  number  of  situations  is  prohibitively  large.  In  control  theory,  Balemi  et  al.  (1993) 
propose  the  use  of  ordered  binary  decision  diagrams  (BDDs;  Bryant  1986)  as  a  compact  representation  of 
supervisory  controllers,  and  this  representation  has  more  recently  also  been  used  in  the  Al  community  for 
nondeterministic  planning  (Cimatti  et  al.  1998;  Jensen  and  Veloso  2000).  Kabanza  et  al.  (1997)  attempt  to 
address  the  time  complexity  problem  by  proposing  an  incremental  algorithm  for  constructing  partial  policies. 
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Their  planning  system  relies  on  domain-specific  search  control  rules  for  efficiency,  and  produces  a  universal 
plan  if  given  enough  time. 

By  requiring  a  stochastic  domain  model,  with  state  transitions  weighted  by  probabilities,  a  probabilistic 
planner  has  a  more  detailed  model  of  uncertainty  to  work  with.  It  can  therefore  choose  to  focus  planning 
effort  on  the  most  relevant  parts  of  the  state  space.  A  plan  may  fail  because  some  contingencies  have  not 
been  planned  for,  but  this  is  acceptable  so  long  as  the  success  probability  of  the  plan  is  high.  Having  to  deal 
with  probabilities  can,  however,  be  computationally  more  challenging  than  working  with  nondeterministic 
models.  In  recent  work,  Jensen  et  al.  (2004)  present  a  compromise  solution,  distinguishing  between  primary 
and  secondary  effects  of  actions  without  assigning  probabilities  to  state  transitions.  The  resulting  planning 
framework  can  produce  plans  that  arc  robust  for  up  to  n  faults. 

Drummond  and  Bresina  (1990)  present  an  anytime  algorithm  for  generating  partial  policies  with  high 
probability  of  achieving  goals  expressed  using  a  modal  temporal  logic.  Other  research  on  probabilistic 
planning  typically  considers  only  propositional  goals.  Kushmerick  et  al.  (1995)  and  Lesh  et  al.  (1998)  work 
with  plans  consisting  of  actions  that  are  executed  in  sequence  regardless  of  the  outcome  of  the  previous 
actions.  This  is  often  called  conformant  planning  (Smith  and  Weld  1998).  Conditional  probabilistic  plans 
(Blythe  1994;  Draper  et  al.  1994;  Goldman  and  Boddy  1994b)  allow  for  some  adaptation  to  the  situation 
during  plan  execution.  In  the  work  by  Draper  et  al.,  this  adaptation  is  obtained  by  means  of  explicit  sensing 
actions  that  are  made  part  of  the  plan. 

Sampling  techniques  have  been  used  for  probabilistic  plan  assessment  by  Blythe  (1994)  and  Lesh  et  al. 
(1998).  In  both  cases,  however,  the  probability  of  plan  success  is  estimated  using  flawed  statistical  methods. 
The  estimation  is  based  on  the  normal  assumption,  which  is  known  to  give  unreliable  results  when  used  to 
estimate  proportions  (see,  e.g.,  Fujino  1980;  Hall  1982;  Agresti  and  Coull  1998;  Newcombe  1998;  Brown 
et  al.  2001).  Furthermore,  statistical  hypothesis  testing  would  be  more  appropriate  in  both  cases  because 
the  probability  estimate  is  only  used  to  compare  two  plans  or  to  test  if  the  success  probability  exceeds  a 
specified  threshold.  Lesh  et  al.  use  an  interesting  data  mining  technique,  however,  for  analyzing  simulation 
traces  in  order  to  discover  plan  flaws.  The  technique,  which  is  more  thoroughly  described  by  Zaki  et  al. 
(2000),  targets  discrete-time  planning  domains.  It  has  some  similarities  with  our  failure  analysis  approach 
presented  in  Chapter  8,  and  is  in  many  ways  more  ambitious  than  our  approach. 

In  decision  theoretic  planning,  a  reward  structure  is  added  to  the  probabilistic  model,  and  the  objective 
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is  to  find  a  control  policy  that  maximizes  the  expected  reward  during  execution.  The  discrete-time  MDP 
formalism  (Section  2.4)  has  received  significant  attention  in  the  AI  community  in  the  past  decade,  with 
applications  ranging  from  robot  navigation  (Koenig  et  al.  1995)  to  elevator  control  (Nikovski  and  Brand 
2003).  Considerable  progress  has  been  made  on  algorithms  for  MDP  planning  that  exploit  structure  in  the 
model.  Boutilier  et  al.  (1995)  use  dynamic  Bayesian  networks  (Dean  and  Kanazawa  1989)  to  represent 
transition  probability  matrices  and  decision  trees  to  represent  conditional  probability  tables  and  policies, 
and  propose  the  structured  policy  iteration  algorithm.  Hoey  et  al.  (1999)  use  a  similar  approach,  but  replace 
decision  trees  with  MTBDDs.2 

Even  structured  solution  techniques  suffer  from  the  state  space  explosion  problem.  Approximate  solu¬ 
tion  techniques,  including  automated  state  abstraction  (Boutilier  and  Dearden  1994;  Dearden  and  Boutilier 
1997)  and  value  function  approximation  (Bellman  et  al.  1963;  Gordon  1995;  Guestrin  et  al.  2003)  aim  to 
address  this  problem  by  sacrificing  optimality  for  efficiency. 

Boyan  and  Littman  (2001)  propose  an  extension  of  MDPs — time-dependent  MDPs  (TMDPs) — where 
the  time  between  state  transitions  can  depend  on  the  current  time.  The  model  corresponds  to  a  general  state 
space  MDP  with  a  single  continuous  state  variable  representing  global  time.  State  spaces  with  multiple 
continuous  state  variables  are  considered  by  Feng  et  al.  (2004),  but  restricted  to  discrete  transition  functions 
(i.e.  each  state  can  only  have  a  finite  number  of  possible  successors). 

In  Chapter  9,  we  introduce  the  generalized  semi-Markov  decision  process  (GSMDP),  which  can  be 
used  to  model  decision  theoretic  planning  problems  with  asynchronous  events  and  actions.  GSMDPs  can 
be  viewed  as  compositions  of  asynchronous  SMDPs.  They  differ  from  TMDPs  in  that  they  essentially 
require  one  local  clock  for  each  event  in  the  model.  The  algorithm  of  Feng  et  al.  (2004)  can  handle  mul¬ 
tiple  continuous  state  variables,  but  the  restriction  to  discrete  transition  functions  makes  it  inadequate  for 
GSMDP  planning.  Some  attention  has  recently  been  given  to  planning  with  concurrent  actions.  Guestrin 
et  al.  (2002)  and  Mausam  and  Weld  (2004)  use  discrete-time  MDPs  to  model  and  solve  planning  problems 
with  concurrent  actions,  but  these  approaches  are  restricted  to  instantaneous  actions  executed  in  synchrony. 
Rohanimanesh  and  Mahadevan  (2001)  consider  planning  problems  with  temporally  extended  actions  that 
can  be  executed  in  parallel.  By  restricting  the  temporally  extended  actions  to  Markov  options ,  the  resulting 
planning  problems  can  be  modeled  as  discrete-time  SMDPs. 

2MTBDDs  are  also  know  as  algebraic  decision  diagrams  (ADDs),  and  this  is  the  name  typically  used  in  the  AI  literature. 
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The  GSMDP  framework  can  be  thought  of  as  a  probabilistic  and  decision  theoretic  extension  of  the 
planning  framework  developed  by  Musliner  et  al.  (1995),  which  is  paid  of  the  CIRCA  architecture.  The 
CIRCA  domain  model  is  a  nondeterministic  timed  automata,  with  uncertainty  in  the  duration  and  outcome 
of  events  and  actions.  The  nondeterministic  domain  model  is  essentially  a  GSMR  but  with  intervals  of 
possible  delays  in  place  of  delay  distributions  and  without  probabilities  associated  with  state  transitions. 
The  CIRCA  planner  can  generate  plans  that  arc  guaranteed  to  maintain  system  safety.  Plan  generation  is 
done  incrementally.  Starting  from  the  initial  state,  actions  arc  assigned  to  states  as  the  states  arc  determined 
to  be  reachable.  Following  each  action  assignment,  the  current  plan  is  verified  to  see  if  failure  is  always 
avoided.  If  a  failure  state  is  reachable,  the  planner  backtracks  to  consider  alternative  action  assignments.  A 
counterexample,  in  the  form  of  an  execution  trace,  is  generated  by  the  verifier  if  a  plan  is  determined  to  be 
unsafe.  The  counterexample  traces  can  be  used  to  guide  plan  repair  (Goldman  et  al.  2004).  The  probabilistic 
planning  framework  presented  in  Chapter  8  is  based  on  the  CIRCA  planning  framework.  In  particular,  it 
makes  use  of  a  verifier  to  find  reasons  for  plan  failure. 

Atkins  et  al.  (1996)  describe  a  probabilistic  extension  of  CIRCA,  but  it  does  not  permit  a  modular 
specification  of  asynchronous  events.  The  user  is  required  to  specify  the  joint  distribution  for  any  set  of 
events  that  can  be  enabled  simultaneously,  which  can  be  rather  cumbersome.  Furthermore,  their  approach 
does  not  handle  state  spaces  with  cycles.  Li  et  al.  (2003)  attempt  to  address  some  of  these  issues,  but  rely  on 
ad  hoc  approximation  techniques  using  “probability  rate  functions.”  The  use  of  phase-type  distributions,  as 
described  in  Chapter  9,  is  a  more  principled  way  of  dealing  with  general  delay  distributions  for  asynchronous 
events  and  actions. 


Part  I 

Verification 
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Specifying  Properties  of 
Stochastic  Discrete  Event  Systems 


Given  a  stochastic  discrete  event  system,  it  is  often  of  interest  to  be  able  to  specify  properties  of  the  system. 
These  properties  could  represent  behavior  that  we  want  the  system  to  exhibit  during  execution.  For  example, 
a  desirable  property  of  a  telephone  system  might  be  that  the  probability  of  a  call  getting  dropped  is  low. 
To  enabled  automatic  verification  of  stochastic  discrete  event  systems,  we  need  a  formalism  for  expressing 
interesting  properties  of  such  systems.  This  chapter  introduces  the  unified  temporal  stochastic  logic  (UTSL), 
which  can  be  used  to  express  properties  such  as  “the  probability  is  at  most  0.01  that  a  call  is  dropped  within 
60  minutes  from  now.”  UTSL  has  essentially  the  same  syntax  as  the  existing  logics  PCTL  and  CSL,  but 
UTSL  provides  a  unified  semantics  for  both  discrete-time  and  continuous-time  systems,  as  well  as  systems 
with  discrete,  continuous,  and  general  state  spaces.  This  will  allow  us,  for  the  most  part,  to  treat  all  stochastic 
discrete  event  systems  uniformly  when  presenting  a  statistical  approach  to  probabilistic  model  checking  in 
the  next  chapter. 


4.1  Temporal  Logic 

The  use  of  temporal  logic  (Rescher  and  Urquhart  1971)  for  specifying  properties  of  deterministic  and  non- 
deterministic  systems  with  program  verification  in  mind  was  pioneered  by  Pnueli  (1977)  and  is  now  a  wide- 
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spread  practice  in  the  model  checking  community.  The  propositional  branching-time  logic  CTL  (computa¬ 
tion  tree  logic;  Clarke  and  Emerson  1982;  Clarke  et  al.  1986),  a  particularly  popular  formalism,  can  be  used 
to  express  properties  such  as  “for  all  trajectories,  T  eventually  becomes  true  with  <f>  holding  continuously 
until  then”  and  “there  exists  a  trajectory  such  that  <I>  holds  after  the  next  state  transition.”  CTL  is  related 
to  (Ben-Ari  et  al.  1981,  1983)  and  the  branching-time  temporal  logic  described  by  Lamport  (1980). 
Emerson  (1990)  provides  an  excellent  survey  of  temporal  logics  with  a  model  checking  perspective. 

For  many  real-time  systems,  it  is  important  to  ensure  that  deadlines  are  met.  To  reason  about  deadlines, 
we  need  to  be  able  to  express  quantitative  temporal  properties  of  a  system.  Extensions  of  CTL  with  time  as 
a  discrete  (RTCTL;  Emerson  et  al.  1990,  1992)  or  continuous  (TCTL;  Alur  et  al.  1990,  1993)  quantity  have 
therefore  been  proposed.  With  RTCTL  and  TCTL,  it  is  possible  to  express  timed  properties  such  as  “for  all 
trajectories,  4>  becomes  true  within  t  time  units.”  Earlier  work  in  the  same  direction  includes  Bernstein  and 
Harter’s  (1981)  extension  of  Lamport’s  logic  that  associates  time  bounds  with  eventualities.  A  survey  on  the 
topic  of  logics  for  real-time  systems  is  provided  by  Alur  and  Henzinger  (1992). 

The  logic  TCTL  has  also  been  proposed  as  a  formalism  for  expressing  properties  of  continuous-time 
stochastic  systems,  but  with  “for  all  trajectories”  (V)  and  “there  exists  a  trajectory”  (3)  reinterpreted  as  “with 
probability  one”  and  “with  positive  probability”,  respectively  (Alur  et  al.  1991).  The  same  interpretation  is 
given  to  the  path  quantifiers  V  and  3  in  earlier  work  by  Hart  and  Sharir  (1984)  on  the  branching-time  logic 
PTL  for  discrete-time  stochastic  processes. 


4.2  UTSL:  The  Unified  Temporal  Stochastic  Logic 

In  many  cases,  it  is  not  economically  or  physically  feasible  to  ensure  certain  behaviors  with  probability  one, 
but  simply  guaranteeing  that  the  behavior  can  be  exhibited  by  the  system  with  positive  probability  may  be 
too  weak.  For  example,  designing  a  telephone  system  where  no  call  is  ever  dropped  would  be  excessively 
costly,  but  it  is  not  satisfactory  to  just  know  that  a  call  can  possibly  go  through.  For  the  telephone  system, 
we  would  like  to  ensure  that  calls  go  though  with  a  reasonably  high  probability,  for  example  0.9999.  Neither 
TCTL  nor  PTL  permit  us  to  express  such  a  property.  For  this,  we  need  a  different  path  quantifier,  which  is 
provided  by  PCTL  (Hansson  and  Jonsson  1989,  1994).  PCTL  has  quantitative  time  bounds  just  as  RTCTL, 
on  which  PCTL  is  based,  but  the  path  quantifiers  V  and  3  are  replaced  by  a  single  probabilistic  path  quantifier. 
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This  lets  us  express  quantitative  bounds  on  the  probability  of  a  set  of  trajectories.  For  example,  PCTL  can 
express  the  property  “with  probability  at  least  9,  $  will  be  satisfied  within  t  time  units.” 

PCTL  formulae  arc  interpreted  over  discrete-time  Markov  processes.  Aziz  et  al.  (1996,  2000)  propose 
a  similar  logic,  CSL  (continuous  stochastic  logic),  with  formulae  interpreted  over  continuous-time  Markov 
processes.  A  variation  of  CSL  has  been  proposed  by  Baier  et  al.  (1999,  2003),  which  also  includes  a  facility 
for  expressing  bounds  on  steady-state  probabilities.  This  version  of  CSL  has  also  been  used  for  expressing 
properties  of  semi-Markov  processes  (Infante  Lopez  et  al.  2001).  Yet  another  logic,  with  essentially  the  same 
syntax  as  PCTL,  has  been  proposed  for  expressing  properties  of  probabilistic  timed  automata  (Kwiatkowska 
et  al.  2000).  While  the  difference  in  syntax  is  minimal  between  all  mentioned  logics  for  expressing  prob¬ 
abilistic  real-time  properties,  the  semantics  of  the  various  logics  are  tied  to  specific  classes  of  stochastic 
processes,  for  example  discrete-time  Markov  processes  in  the  case  of  PCTL.  To  avoid  having  to  refer  to 
different  logics  for  different  classes  of  systems,  we  introduce  the  logic  UTSL,  with  a  unified  semantics  for 
all  measurable  stochastic  discrete  event  systems. 

The  syntactic  structure  of  UTSL  is  the  same  as  that  of  both  CSL  (without  the  steady-state  operator)  and 
PCTL,  although  we  use  the  notation  of  Baier  et  al.  (2003)  rather  than  that  of  Hansson  and  Jonsson  (1994). 

Definition  4.1  (UTSL  Syntax).  Let  A4  =  (S,  T,  //.  SV ,  V)  be  a  factored  stochastic  discrete  event  system. 
The  syntax  for  UTSL  is  defined  inductively  as  follows: 

1.  x  ~  v  is  a  UTSL  formula  for  x  G  SV,  v  G  Dx,  and  ~  G  {<,  =,  >}. 

2.  -i$  is  a  UTSL  formula  if  <h  is  a  UTSL  formula. 

3.  4>  AT  is  a  UTSL  formula  if  both  <b  and  T  arc  UTSL  formulae. 

4-  VixiO  [X1  <I>] ,  for  cxi  G  {<,  >},  9  G  [0, 1]  and  I  C  T,  is  a  UTSL  formula  if  is  a  UTSL  formula. 

5-  V^e  [4*  G1  \P] ,  for  cxi  G  {<,  >},  9  G  [0, 1]  and  I  C  T,  is  a  UTSL  formula  if  both  <3?  and  4r  are  UTSL 
formulae. 

If  the  time  domain  T  is  the  non-negative  integers,  UTSL  syntax  coincides  with  PCTL  syntax,  and  we 
get  CSL  syntax  by  letting  T  be  the  non-negative  real  numbers. 

The  standard  logic  operators,  -i  and  A,  have  their  usual  meaning.  The  UTSL  operator  'Ptx,o\-}  replaces 
the  traditional  CTL  path  quantifiers  V  and  3.  The  truth  value  of  a  path  formula  ip,  i.e.  either  X 1  $  (“next”) 
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or  &  U1  x L  (“until”),  is  determined  over  a  trajectory  (sample  path)  for  a  system.  The  path  formula  X1  <h 
asserts  that  the  next  state  transition  occurs  tel  time  units  into  the  future  and  that  <1>  holds  at  the  time  instant 
immediately  following  the  state  transition,  while  <h  U1  T  asserts  that  T  becomes  true  tel  time  units  into 
the  future  while  <f>  holds  continuously  prior  to  time  t.  Since  we  arc  dealing  with  measurable  stochastic 
systems,  there  is  some  probability  associated  with  the  set  of  trajectories  that  satisfy  ip.  The  probabilistic 
path  quantifier  'Plxlo[j  permits  us  to  compare  this  probability  against  an  arbitrary  threshold  6. 

Definition  4.1  provides  a  bare -bones  version  of  UTSL.  Additional  UTSL  formulae  can  be  derived  in 
the  usual  way.  For  example,  _L  =  (x  =  v)  A  ->(x  =  v)  for  some  x  e  S V  and  v  e  Dx,  T  =  — >_L, 
x  <  v  =  ~>{x  >  v),  4>  V  ^  A  -iVP),  4>  — >  <]/  =  V  4',  and  V<g[tp\  =  ~'V>e\ip\.  We  have 

associated  a  time  bound  I  with  the  path  operators  X  and  U.  The  unbounded  versions  of  these  operators  arc 
obtained  by  letting  I  equal  the  time  domain  T,  for  example  U  T]  =  V^g  [<f>  U1  T] .  We  can  derive 

additional  path  operators,  such  as  W  (“weak  until”),  O  (“eventually”),  and  □  (“continuously”),  as  follows 
(Hansson  and  Jonsson  1994): 


P>  e  [4>  =  P<  i_e  U1  V  ®)] 

P<  e  [4>  W1  =  P>  i _e  [-i®  U1  V  4>)] 

PMe[07  4>]  =PMe[T  U1  T>\ 

V^gp  4>]  ±] 


Unbounded  versions  of  these  path  operators  can  be  derived  in  the  same  way  as  for  X  and  U. 


4.3  UTSL  Semantics  and  Model  Checking  Problems 

The  validity  of  a  UTSL  formula  is  determined  relative  to  a  trajectory  prefix.  For  simple  UTSL  formulae  of 
the  form  x  ~  v,  the  validity  depends  only  on  the  last  state  of  the  trajectory  prefix,  but  this  is  not  necessarily 
the  case  for  UTSL  formulae  containing  one  or  more  probabilistic  operators.  The  formal  semantics  of  UTSL 
is  given  by  the  following  inductive  definition. 

Definition  4.2  (UTSL  Semantics).  Let  A4  =  (S,  T,  //.  S  V .  V)  be  a  factored  stochastic  discrete  event  sys¬ 
tem.  With  Path(cr<T )  denoting  the  set  of  trajectories  with  common  prefix  rr,r  and  the  definition  of  Tr 
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given  by  (2.18),  satisfaction  relations  for  UTSL  formulae  and  path  formulae  arc  inductively  defined  by  the 
following  rules: 

■M.,  {(so,  to),  •  •  • ,  (sk,tk)}  |=  x  ~  v  if  V(sk,x)  ~  v 

M ,  cr<r  |=  if  M,a<T  |/  4> 

M,  a<T  |=  4>  A  41  if  (M,a<r  |=  4>)  A  (M,a<T  j=  4/) 

At,  cr<T  |=  V^eW]  if  At({£7  ^  Path(a<T)  \  A4,  a,  r  |=  p})  txi  6 

M,a,r  |=  4>  if  3k  G  N.  ((Tfe_!  <  r)  A  (r  <  Tfc)  A  (T*  -  r  G  /)  A  (At ,  a<Tfc  |=  $)) 

M,  a,  t  |=  4>  ZV7  4)  if  3t  €  /.((At,  cr<T+t  |=  A  Vi'  G  T.((t'  <  f)  ->  (At,£T<r+t/  |=  4>))) 

Definition  4.2  specifies  the  validity  of  a  UTSL  formula  at  any  time  during  execution  of  a  stochastic 
discrete  event  system.  We  typically  want  to  know  whether  a  property  4>  holds  for  a  model  A4  if  execution 
starts  in  a  specific  state  s.  The  triple  (At,  s,  4?)  is  a  model  checking  problem  with  an  affirmative  answer  if 
and  only  if  A4,  {(s,  0)}  |=  4>.  More  generally,  we  can  define  the  validity  of  a  UTSL  formula  relative  to  a 
probability  measure  //q,  such  that  po(S')  is  the  probability  that  execution  starts  in  a  state  s  G  S'.  This  is 
accomplished  with  the  addition  of  the  following  rules: 

M,po  j=  x  ~  v  if  Vs  G  sup/Lto-(A4,  {(s,  0)}  |=  x  ~  v) 

At,  (A)  |=  -1$  if  At,  po 

M,p0  \=  4>  A  ^  if  (M,p0  \=  4*)  A  (M,p0  1= 

M,p0  h'PooeiT’l  if  f  p({cr  £  Path({(s,0)})  \  M,  o,0  \=  ip})dp0(S)txiO 

Js 

The  probability  integral  in  the  last  rule  reduces  to  J2ses  To (s)p({a  £  Path({(s,  0)})  |  At,  cr,  0  |=  99})  if 
the  state  space  S  is  countable.  A  UTSL  model  checking  problem  can  now  be  specified  as  a  triple  (At,  po,  4>). 
This  definition  subsumes  the  definition  with  a  single  initial  state. 

The  semantics  of  4>  U1  4)  requires  that  4>  holds  continuously,  i.e.  at  every  point  in  time,  along  a  trajectory 
until  4)  is  satisfied.  If  4>  and  T  arc  both  free  of  any  probabilistic  operators,  however,  then  the  truth  values 
of  these  subformulae  do  not  depend  on  the  amount  of  time  that  is  spent  in  a  specific  state.  Without  nested 
probabilistic  operators,  it  is  therefore  sufficient  to  verify  the  subformulae  4>  and  4)  at  the  entry  of  each 


60 


CHAPTER  4.  SPECIFYING  PROPERTIES  OF  STOCHASTIC  DISCRETE  EVENT  SYSTEMS 


Figure  4.1:  A  simple  two-state  semi-Markov  process.  The  time  from  when  the  left  state  (so)  is  entered  until  the 
transition  to  the  right  state  (si)  occurs  is  a  random  variable  with  distribution  G. 

state  along  a  trajectory.  The  same  can  be  said  for  stochastic  discrete  event  systems  that  satisfy  the  Markov 
property  (2.20),  even  with  nested  probabilistic  operators  present,  because  the  Markov  property  ensures  that 
the  amount  of  time  spent  in  a  state  does  not  have  an  impact  on  the  future  behavior  of  the  process. 

In  general,  with  nested  probabilistic  operators  and  without  the  Markov  assumption,  it  may  not  be  suf¬ 
ficient  to  verify  the  subformulae  <b  and  T  of  <h  IA1  T  at  the  time  of  state  transitions.  We  illustrate  this 
with  two  simple  examples.  The  first  example  shows  that  a  UTSL  formula  can  be  true  at  the  entry  of  a  state 
without  holding  continuously  while  remaining  in  the  same  state.  The  second  example  shows  that  a  UTSL 
formula  can  become  true  while  remaining  in  a  state  without  being  true  at  the  entry  of  the  state.  It  should 
be  noted  that  the  statistical  model  checking  algorithm  we  present  in  the  next  chapter  can  deal  with  nested 
probabilistic  operators  only  if  it  is  sufficient  to  verify  nested  formulae  at  discrete  points  along  a  trajectory, 
which  generally  means  that  we  must  be  dealing  with  a  discrete-time  model  or  a  model  satisfying  the  Markov 
property. 

Example  4.1.  Consider  the  semi-Markov  process  with  two  states  depicted  in  Figure  4.1.  Assume  that  G  is 
a  standard  Weibull  distribution  with  shape  parameter  0.5,  denoted  W (1, 0.5),  and  that  we  want  to  verify  the 
UTSL  formula  $  =  where  ip  is  the  path  formula  V>o.5  [x=0  U^]A^  x=l]  U^l]A  x=l,  relative  to 

the  trajectory  prefix  {(sq,  0)}. 

To  solve  this  problem,  we  compute  the  probability  measure  of  the  set  of  trajectories  that  start  in  so  at 
time  0  and  satisfy  the  path  formula  ip.  Let  P  denote  this  set.  Members  of  P  arc  of  the  form  {  (sq,  t),  {s\,  oo)} 
whh  t  €  [0,  t']  for  some  t'  <  1.  The  probability  measure  of  P  is  therefore  at  most  F(l)  ~  0.632,  where 
F(-)  is  the  cumulative  distribution  function  for  W(l,  0.5).  Of  the  trajectories  with  t  £  [0, 1],  only  the  ones 
where  4'  =  V>  0.5  [x=0  U^AA^  x=l]  holds  until  .S]  is  reached  satisfy  the  path  formula  ip. 

If  we  require  T  to  hold  continuously  along  a  trajectory  until  s\  is  reached,  then  we  have  to  rule  out 
trajectories  with  t  >  t'  such  that  T  does  not  hold  if  verified  relative  to  the  trajectory  prefix  { (s'o- 1')}.  The 
probability  of  reaching  si  within  1  time  unit,  given  that  we  have  already  spent  t'  time  units  in  sq,  is  given 
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by  the  formula 

1  rt'+i 

q(t')  =  - — —  /  fix)  dx 

1  -  F(t')  Jt,  ’ 

where  /(•)  is  the  probability  density  function  for  W(l,  0.5).  The  value  of  q  is  greater  than  0.5  for  t'  = 
0.1,  but  less  than  0.5  for  t'  =  0.2.  Since  q  is  a  decreasing  function  of  t' ,  it  means  that  T  does  not  hold 
continuously  over  trajectories  stalling  in  so  if  t  >  0.2.  It  follows  that  the  probability  measure  of  the  set  P 
is  less  than  F(0.2)  ~  0.361,  so  <I>  does  not  hold.  We  would  reach  the  opposite  conclusion  if  we  simply 
verified  the  nested  formulae  at  the  entry  of  each  state,  since  'll  holds  initially  in  so- 

Example  4.2.  Consider  the  same  two-state  semi-Markov  process  as  in  the  previous  example,  but  this  time 
with  G  equal  to  1, 1.5).  Assume  that  we  want  to  verify  the  UTSL  formula  <I>  =  V>  o.rjyj],  where  (p 
is  the  path  formula  x=0  Z^0’11  V>o.7[x=0  U (O’1!  x=l].  Note  that  the  time  interval  is  open  to  the  left 
in  the  formula  '3/  =  V>o.7\x=i)  Z^0,1!  x=l],  so  'k  cannot  hold  in  si  because  x=0  must  hold  at  the 
entry  of  a  state  for  'll  to  hold  in  that  state.  T  does  not  hold  immediately  in  so  either:  the  probability  of 
reaching  si  within  1  time  unit  is  F(l)  ~  0.632  <  0.7  at  time  0  in  so-  The  formula  'll  does  become  true, 
however,  along  trajectories  that  remain  in  so  for  0.2  time  units  or  more  before  transitioning  to  si.  Since 
F(l)  —  F( 0.2)  ~  0.547  >  0.5,  it  follows  that  <3>  holds  with  the  semantics  given  by  Definition  4.2. 

Our  semantics  for  time-bounded  until  is  consistent  with  that  of  TCTL  defined  by  Alur  et  al.  (1991). 
Infante  Lopez  et  al.  (2001)  propose  a  semantics  of  CSL  for  semi -Markov  processes  that  does  not  require 
subformulae  to  hold  continuously  in  a  state  along  a  trajectory.  With  their  semantics,  one  would  get  the 
opposite  result  in  both  of  the  examples  above.  While  the  semantics  of  Infante  Lopez  et  al.  makes  it  easier  to 
verify  properties  with  nested  probabilistic  operators,  it  is  not  consistent  with  the  common  definition  of  a  tra¬ 
jectory  for  a  continuous-time  discrete  event  system  as  a  piecewise  linear  function  of  time.  Furthermore,  one 
could  imagine  using  phase-type  distributions  to  approximate  the  Weibull  distributions  in  the  two  examples 
and  verify  the  properties  for  the  resulting  Markov  processes.  The  introduction  of  phase  transitions  would 
result  in  nested  formulae  possibly  being  verified  at  different  times  in  the  same  state,  which  is  inconsistent 
with  the  semantics  of  Infante  Lopez  et  al. 


Chapter  5 


Statistical  Probabilistic  Model  Checking 


This  chapter  presents  a  statistical  approach  to  probabilistic  model  checking,  employing  hypothesis  testing 
and  discrete  event  simulation.  The  proposed  solution  method  works  for  any  discrete  event  system  that  can 
be  simulated,  although  the  method  for  verifying  properties  with  nested  probabilistic  statements  is  limited  to 
discrete-time  systems  or  systems  satisfying  the  Markov  property.  We  prove  two  fundamental  theorems  that 
establish  efficient  verification  procedures  for  conjunctive  and  nested  probabilistic  statements,  we  discuss 
benefits  and  hazards  of  using  distributed  sampling,  and  we  provide  complexity  results  for  the  statistical 
solution  method. 

Consider  the  UTSL  model  checking  problem  (M,  /  to .  n  [93] ) .  The  set  of  trajectories  satisfying  tp  and 

with  the  initial  state  distributed  according  to  po  has  probability  measure 

P=  [  /r({rr  <E  Path({{s,0)})  \  M,a,0  \=  <p})  dp0(S)  . 

Js 

We  could  solve  the  model  checking  problem  by  computing  p  and  then  compare  it  to  the  threshold  9,  but  a 
numerical  computation  of  p  is  not  feasible  for  certain  classes  of  stochastic  discrete  event  systems,  in  partic¬ 
ular  many  infinite-state  systems  and  generalized  semi-Markov  processes.  For  Markov  processes,  efficient 
numerical  techniques  for  computing  p  do  exist  (Hansson  and  Jonsson  1994;  Baier  et  al.  2003),  but  the  com¬ 
putational  complexity  of  these  techniques  is  proportional  to  the  size  of  the  state  space,  which  puts  limits  on 
their  applicability  for  verifying  properties  of  stochastic  systems  with  large  state  spaces. 

Simulation  has  often  been  advertised  as  a  last  resort  when  numerical  techniques  fail  (see,  e.g.,  Teichroew 
and  Lubin  1966;  Buchholz  1998)  and  it  is  a  technique  with  roots  in  the  infancy  of  computer  science.  The 
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Monte  Carlo  method  (Ulam  and  von  Neumann  1947;  Metropolis  and  Ulam  1949),  which  is  essentially  a 
statistical  approach  to  the  study  of  integro-differential  equations,  was  conceived  by  S.  Ulam  in  1946  to  solve 
problems  in  mathematical  physics  on  ENIAC — the  first  digital  computer  (Metropolis  1987;  Eckhardt  1987). 

It  is  therefore  reasonable  to  consider  statistical  techniques,  involving  simulation  and  sampling,  to  solve 
UTSL  model  checking  problems.  For  this  purpose,  we  set  up  a  chance  experiment  represented  by  Bernoulli 
variates  Xt  with  parameter  p.  We  could  then  proceed  by  estimating  p  with  a  confidence  interval  using 
techniques  for  estimating  the  mean  of  a  distribution  with  unknown  variance  (see,  e.g.,  Chow  and  Robbins 
1965;  Nadas  1969;  Raatikainen  1995).  Note,  however,  that  in  order  to  verify  the  UTSL  formula  'P;x] q [ a] >  we 
do  not  need  to  have  an  accurate  estimate  of  p — we  need  to  know  only  if  p  is  above  or  below  the  threshold  0. 
It  would  be  a  waste  of  effort  to  obtain  an  accurate  estimate  of  p,  only  to  realize  that  p  is  far  from  0. 

In  Section  2.2,  we  discussed  acceptance  sampling,  a  statistical  technique  for  testing  if  the  parameter  p  of 
a  Bernoulli  variate  is  above  or  below  a  threshold  9.  This  is  exactly  what  we  need  for  UTSL  model  checking. 
The  verification  of  a  probabilistic  UTSL  formula,  for  example  <f>  =  V>g[p\,  can  be  thought  of  in  terms  of 
hypothesis  testing.  To  verify  <h  we  need  to  test  the  hypothesis  H  :  p  >  9  against  the  alternative  hypothesis 
K  :  p  <  9.  We  first  restrict  our  attention  to  UTSL  formulae  without  nested  probabilistic  operators.  In 
Section  5.2,  we  consider  the  general  case  and  show  how  nested  probabilistic  operators  can  be  dealt  with 
using  statistical  techniques,  at  least  for  certain  classes  of  stochastic  discrete  event  systems. 


5.1  Model  Checking  without  Nested  Probabilistic  Operators 


To  use  acceptance  sampling  for  the  purpose  of  UTSL  model  checking,  we  need  to  introduce  the  concept  of 
an  indifference  region.  With  each  formula  of  the  form  Vxl  q  [93] ,  we  associate  an  indifference  region  centered 
around  9  with  half-width  5(0).  The  half-width  can  be  a  constant,  such  as  10-1,  but  it  is  sometimes  desirable 
to  let  the  half-width  be  a  function  of  9.  A  reasonable  choice  in  that  case  is 


(5.1) 


5(0) 


2 509  if  0  <  0.5 

250(1  —  9)  if  0  >  0.5 


which  makes  the  half- width  5o  if  0  is  0.5  and  smaller  if  0  is  close  to  0  or  1.  We  modify  the  semantics 
of  UTSL  to  account  for  indifference  regions.  This  is  done  by  replacing  the  satisfaction  relation  |=  with 
two  relations  |«T  and  L  representing  satisfaction  and  unsatisfaction,  respectively,  for  UTSL  formulae 
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when  taking  indifference  regions  into  account.  The  relations  and  |cs±  arc  mutually  exclusive,  but  not 
exhaustive,  so  is  not  equivalent  to  \fcT. 

Definition  5.1  (UTSL  Semantics  with  Indifference  Regions).  Let  Ad  =  (S,  T,  //,  SV ,  V)  be  a  factored 
stochastic  discrete  event  system,  and  let  5(9)  be  a  function  determining  the  half-width  of  an  indifference 
region  centered  around  9.  A  satisfaction  relation  |wT  and  an  unsatisfaction  relation  |cs±  for  UTSL  with 
indifference  regions  arc  simultaneously  defined  by  induction  as  follows: 


Ad,{(s0,to),  •  •  - ,  (sk,tk)}  Nt  x  ~  v 
M,{(s0,to),...,{sk,tk)}  Kl  x  ~v 
Ad,  er<r  |~t  -,3> 

Ad,  cr<r  |~x  “,<f> 
Ad,(7<r  |RiT  A  T' 
cr<r  <f>  A  ^ 

Ad,cr<T  v>e[p\ 

M,a<r  'P>e[p\ 

M,o<t  Nt  V<e[p] 
Nx  V<e[p\ 


if  V (sk,  x)  ~  v 
if  V (sk,  x)  oo  v 
if  Ad,  o<T  |«±  4> 
if  Ad,  rr  —  4* 

if  (M,a<T  4>)  A  (Ad,cr<T  |«T  ’L) 

if  ( M,ct<t  4>)  V  ( M,a<T  T>) 

if  fi({< 7  £  Path(o<T )  |  Ad,  a,  r  j=  p] )  >  9  +  5(9) 

if  [i({cr  £  Path(cr<T)  |  Ad,  a,  r  j=  </?})  <  9  —  5(9) 

if  /r({(T  £  Path(cr<T)  |  Ad,  er,  r  |=  p\)  <  9  —  5(9) 

if  /r({(j  £  Path(cr<T)  |  Ad,  a,  r  |=  tp})  >  9  +  5(9) 


It  should  be  clear  from  Definitions  4.2  and  5.1  that,  for  any  UTSL  formula  4>,  (Ad,  <7<r  |«T  4>)  =>■ 
(Ad,  cr<T  |=  4>)  and  (Ad,  <r<T  |rx  4>)  =>•  (Ad,  cr<T  \£  4>).  However,  the  inverse  does  not  hold,  in  general. 
For  example,  it  is  possible  that  Ad,  cr<T  |=  d>  is  satisfied  without  Ad,  er<r  |xc  ,  <I>  being  so  because  of  the 
indifference  regions.  In  fact,  the  triple  (Ad,  <r<T,  Vixio[p})  does  not  belong  to  either  of  the  two  relations  |wT 
and  |«j_  if  p({a  £  Path(cr<T)  |  Ad,  a,  r  |=  <£>})  falls  into  the  indifference  region  for  V^eip],  i.e.  is  less  than 
5(9)  away  from  9. 

Since  we  arc  resorting  to  statistical  techniques  for  solving  UTSL  model  checking  problems,  we  must 
accept  that  we  sometimes  produce  an  incorrect  answer.  This  is  satisfactory,  so  long  as  we  can  guarantee 
certain  a  priori  bounds  on  the  probability  of  an  incorrect  result.  Simply  put,  we  want  the  probability  of 
accepting  a  UTSL  formula  as  true  when  it  is  false  (or  vice  versa)  to  be  below  a  predetermined  threshold. 
To  be  precise,  we  want  our  statistical  model  checking  algorithm  to  accept  a  UTSL  formula  4?  as  true  with 


66 


CHAPTER  5.  STATISTICAL  PROBABILISTIC  MODEL  CHECKING 


probability  at  least  1  —  a  if  Mi,  <x<T  |«T  $  holds,  and  the  probability  should  be  at  most  (3  that  <1>  is  accepted 
as  true  if  Ml.  a<T  $  holds.  Let  Mi.  (J<T  F  <I>  represent  the  fact  that  our  model  checking  algorithm 
accepts  <I>  as  true,  and  let  Mi  .  a<T  Y-  4>  represent  the  fact  that  $  is  rejected  as  false  by  our  algorithm.  For  the 
remainder  of  this  section,  we  will  often  leave  out  Ml  from  relations  for  the  sake  of  brevity.  The  requirements 
for  our  algorithm  can  then  be  summarized  by  the  following  two  conditions: 


(5.2) 

Pr[(j<T  F  4> 

<7<t  |~t  4?]  >  1 

(5.3) 

Pr[(j<T  F  $ 

cr<T  |«x  $]  <  f3 

We  require  that  our  model  checking  algorithm  always  produces  a  result,  i.e.  it  either  accepts  a  UTSL 
formula  as  true  or  rejects  it  as  false.  In  other  words,  the  algorithm  is  required  to  satisfy  the  condition 
-i(<r<T  F  <F)  (<r<T  Y  <h).  It  follows  from  this  requirement  that 

(5.4)  Pr[<r<r  Y  4>  |  <r<T  |«T  4>]  <  a 

is  equivalent  to  condition  (5.2).  The  parameter  a  can  be  interpreted  as  a  bound  on  the  probability  of  a  type 
I  error  (false  negative)  and  (3  can  be  thought  of  a  bound  on  the  probability  of  a  type  II  error  (false  positive), 
provided  that  we  do  not  consider  it  an  error  to  produce  an  incorrect  answer  for  a  model  checking  problem 
when  some  of  the  probabilities  fall  into  an  indifference  region.  By  narrowing  the  indifference  regions  for 
probabilistic  UTSL  operators,  we  can  get  arbitrarily  close  to  a  statistical  algorithm  that  implements  the  true 
semantics  for  UTSL  given  by  Definition  4.2,  although  this  will  most  certainly  come  at  a  cost. 

Let  us  now  consider  the  problem  of  verifying  a  UTSL  formula  <I>  relative  to  a  trajectory  prefix  so  that 
conditions  (5.4)  and  (5.3)  arc  satisfied,  under  the  assumption  that  <I>  does  not  contain  any  nested  probabilistic 
operators.  To  begin  with,  if  is  of  the  form  x  ~  v,  then  it  is  trivial  to  satisfy  the  two  conditions  for  any 
a  and  (3.  Given  a  trajectory  prefix  {(sq,  £o)>  •  •  • ,  {skHk)}>  we  simply  observe  the  value  of  x  in  state  .sy.  and 
compare  it  to  v.  The  probability  of  error  in  this  case  is  always  zero. 

5.1.1  Probabilistic  Operator 

To  verify  the  UTSL  formula  'P:x]  o  [93] ,  we  introduce  Bernoulli  variates  Xt  with  parameter  p,  as  stated  in  the 
introduction  to  this  chapter,  where  p  is  the  probability  measure  of  the  set  of  trajectories  that  satisfy  ip.  An 
observation  of  Xj  can  be  obtained  by  first  generating  a  trajectory  for  Ml  using  discrete  event  simulation  and 
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then  verifying  p  over  the  sampled  trajectory.  If  p  does  not  contain  any  probabilistic  operators,  as  is  assumed 
for  now,  then  we  can  verify  p  without  any  uncertainty  in  the  result.  If  p  is  determined  to  hold  over  the 
sampled  trajectory,  then  the  observation  is  1,  otherwise  it  is  0. 

While  a  trajectory  for  a  stochastic  discrete  event  system  can  be  infinite,  we  assume  that  we  never  need 
to  generate  more  than  a  finite  prefix  of  a  trajectory  in  order  to  determine  the  truth  value  of  p  over  the  entire 
trajectory.  If  p  is  X1  <I>,  then  this  assumption  holds  with  certainty  because  we  only  need  to  simulate  a  single 
state  transition.  If  p  is  <h  U1  dri  then  the  assumption  holds  if  the  probability  is  zero  that  an  infinite  number 
of  state  transitions  occur  before  sup  I  time  units  have  passed.  A  sufficient  condition  for  this  to  be  the  case 
is  that  the  system  is  non-explosive  and  sup  I  is  finite,  as  is  stated  by  the  following  theorem. 

Theorem  5.1  (Sufficient  Conditions  for  Tractability).  The  probability  is  zero  that  an  infinite  trajectory  is 
needed  to  determine  the  truth  value  of  <b  U1  T,  with  sup  I  <  oo,  for  a  non-explosive  stochastic  discrete 
event  system. 

Proof  If  we  have  not  already  encountered  a  state  satisfying  -i<b  V  ^  within  the  first  sup  I  time  units  along 
a  trajectory,  then  we  can  conclude  that  <I>  U 1  T  does  not  hold  without  having  to  look  further  along  the 
trajectory.  If  the  stochastic  discrete  event  system  is  non-explosive,  then  the  probability  measure  is  zero  for 
an  infinite  trajectory  {(so>to)>  (si,ii),  •  •  •}  with  W  1,  <  oo.  It  follows  that  within  a  finite  interval  of 
time,  in  particular  the  interval  [0,sup/],  only  a  finite  number  of  state  transitions  can  occur.  Consequently, 
the  probability  is  one  that  we  can  determine  the  truth  value  of  &  U1  'T  by  looking  at  a  finite  prefix  of  a 
trajectory.  □ 

We  can  now  set  up  a  hypothesis  testing  problem  for  verifying  V>g[p\ .  We  should  test  the  hypothesis 
Hq  ■  p  >  0  +  5(6)  against  the  alternative  hypothesis  H\  :  p  <  9  —  5(6)  (for  V<g[p\,  we  simply  reverse 
the  roles  of  the  two  hypotheses).  The  hypothesis  Hq  holds  if  and  only  if  <r<T  |^sT  V>g[p\  holds,  and  H \  is 
similarly  related  to  the  judgment  cr<T  V>g\p\.  Thus,  by  using  an  acceptance  sampling  test  with  strength 
(a,  ll)  to  decide  er<T  h  V>  o  [p\,  we  can  satisfy  conditions  (5.4)  and  (5.3)  with  our  model  checking  algorithm. 

5.1.2  Composite  State  Formulae 

To  complete  the  model  checking  algorithm,  we  need  to  verify  negated  UTSL  formulae  and  conjunctions 
of  UTSL  formulae.  We  take  a  compositional  approach  to  verification  of  such  formulae.  To  verify  -i4>,  we 
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verify  4>  and  reverse  the  result.  To  verify  a  conjunction,  we  verify  each  conjunct  separately.  The  following 
two  rules  formally  define  the  behavior  of  the  model  checking  algorithm: 

M,  <t<t  I — .<!>  if  M,a<TY  $ 

M ,  a<T  F  4>  A  T'  if  (M ,  <t<t  F  <f>)  A  (M,cr<T  F  T1) 

Next,  we  show  how  to  bound  the  probability  of  error  for  a  composite  UTSL  formula,  assuming  that  we  have 
bounds  for  the  probability  of  error  in  the  verification  results  for  the  subformulae. 

First,  consider  the  verification  of  assuming  we  have  already  verified  $  so  that  conditions  (5.4)  and 
(5.3)  arc  satisfied.  Since  we  negate  the  verification  result  for  <I>,  a  type  I  error  for  <I>  becomes  a  type  II  error 
for  ->4>,  and  a  type  II  error  for  <h  becomes  a  type  I  error  for  -i4>.  To  verify  -i<J>  with  error  bounds  a  and  6, 
we  therefore  have  to  verify  4>  with  error  bounds  (3  and  a  as  stated  by  the  following  proposition. 

Proposition  5.2.  To  verify  ->4>  with  type  I  error  probability  a  and  type  11  error  probability  (3 ,  it  is  sufficient 
to  verify  (I>  with  type  I  error  probability  [3  and  type  II  error  probability  a. 

Proof  Assume  that  Pr[u<T  Y  (I>  |  a<T  T  4>  <  (3  and  Pr[<r<T  F  (I>  |  cj<t  (I>]  <  a.  It  follows  from 

Definition  5.1  that  cr<T  |«T  $  <*=>•  cr<T  and  ct<t  4>  ct<t  _,4>.  Our  model 

checking  algorithm  is  such  that  rr<T  Y  <h  a<T  I - 4>  and  a<T  F  $  o<T  Y  -i4>.  Consequently, 

Pr[<j<T  Y  -i4>  |  u<T  -i$]  =  Pr[cr<T  F  4?  |  (j<T  4>]  <  a  and  Pr[cr<T  I — '4>  |  <r<T  ^sx  ->4>]  =  Pr[cr<T  Y 
$  |  cr<T  $]</?.  □ 

Next,  consider  the  verification  of  <3?  A  \D.  The  conjunction  is  determined  to  hold  by  our  algorithm  if  and 
only  if  both  4>  and  T  arc  determined  to  hold.  A  type  I  error  occurs  if  we  believe  that  at  least  one  of  <h  and 
T  does  not  hold,  when  in  reality  both  arc  true.  A  type  II  error  occurs  if  we  believe  that  both  <h  and  T  hold, 
when  at  least  one  of  the  conjuncts  actually  is  false.  We  will  show  that  in  order  to  verify  a  conjunction  with 
error  bounds  a  and  (3,  it  is  sufficient  to  verify  each  conjunct  with  the  same  error  bounds.  To  prove  this,  we 
use  the  following  elementary  lemma  from  probability  theory. 

Lemma  5.3.  For  arbitrary  events  A  and  B,  Pr[A  A  B]  <  min(Pr[A],  Pr  /I]). 

Using  this  lemma,  we  can  derive  bounds  on  the  error  probabilities  associated  with  the  verification  of  a 
conjunction  based  on  error  bounds  for  the  verification  of  the  individual  conjuncts. 
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Theorem  5.4  (Conjunction).  If  <3> ,  is  verified  with  type  I  error  probability  ai  and  type  II  error  probability 
Pi  for  all  1  <  i  <  n,  then  <J>  =  /\"=1  <J>,;  can  be  verified  with  type  I  error  probability  ^  at  and 

type  11  error  probability  maxi<j<n  Pi,  where  Rej( A"=i  $i)  =  {i  \  cr<T  P  $;}• 

Proof  by  induction.  If  n  =  1,  then  /\”=i  =  3>i>  which  by  assumption  can  be  verified  with  type  I  error 

probability  a  i  =  min  an  and  type  II  error  probability  d\  =  max  P\ . 

Assume  that  $  =  /\ -=i  f°r  some  n  >  1,  can  be  verified  with  type  I  error  probability  a  = 
min i£Rej($)Oti  and  type  II  error  probability  6  =  maxi  <*<„  6t.  Furthermore,  assume  that  Pr[cr<T  Y 
<f>n+i  |  o<T  |«T  3>„+i]  <  an+ 1  and  Pr[er<T  F  <Fn+i  |  <r<T  $„+i]  <  Pn+ 1- 

It  follows  from  Definition  5.1  that  <r<T  |«T  <J>  A  $n+i  5=>-  (a<r  Nt  3>)  A  (<r<T  |~T  3>n+i)-  We  thus 
have  Pr[a<T  Y  $  A  $>„+i  |  <r<r  |«T  $  A  <Fn+i]  =  Pr[a<T  Y  $  A  T>n+i  |  (a<T  |«T  <f>)  A  (er<T  |«T  $„+i)]. 
There  are  three  ways  in  which  a  type  I  error  can  occur,  i.e.  our  model  checking  algorithm  can  conclude 
a<rY  $A$n+i: 


1.  If  both  <b  and  <bra+i  are  verified  to  be  false,  then  Pr[<r<T  Y  $  A  <hra+i  |  (<r<r  |«T  <J>)  A  (er<T  |«T 
$n+i)]  =  Pr[(n<T  Y  <F)  A  (<7<r  Y  <f>n+i)  |  (a<T  |«T  $)  A  (cr<r  |«T  $>n+i)],  which  by  Lemma  5.3  is 
at  most  min(Pr[<7<r  Y  $  |  (<7<r  |«T  $)  A  (<r<T  |«T  $n+i)],  Pr[<r<r  Y  $„+i  |  ( a<T  |«T  $)  A  (cr<T 
<f>n_l_i)]).  By  assumption,  this  is  at  most  min(a,  an+ 1)  =  minje^ej($A3>n+i)  ai- 

2.  If  $  is  verified  to  be  false  and  <f>n+i  is  verified  to  be  true,  then  Pr [a<T  Y  <b  A  <bn+i  |  (cr<r 

<f>)  A  (<7<r  |«T  $n+l)]  =  Pl'[(cr<r  Y  3>)  A  (a<r  F  <J>n+l)  I  (o<t  |~t  $)  A  (<7<r  $n+l)]  < 

min  (a,  1)  m  i  i  i  ^  /Ay  (‘Iva‘11  .  i )  ty.%, 

3.  If  $  is  verified  to  be  true  and  <I>n+i  is  verified  to  be  false,  then  Pr[er<T  F  T  A  <I>n+i  |  (<7<r  A , 

$)  A  (<7<r  $n+l)]  =  Pl'[(%r  F  $)  A  (a<T  Y  $n+l)  I  (c J<T  |«T  $)  A  (<7<r  |«T  $n+l)]  < 

min(l, an+i)  =  an+ If  <b  is  verified  as  true,  then  Rejp 3?)  =  0.  We  therefore  have  an+  \  = 

miniei?ej($A$„+i)  ai- 

In  all  three  cases  the  probability  of  a  type  I  error  is  bounded  by  mirqe  px, A<i>„+1 )  ai  as  required. 

Our  model  checking  algorithm  will  conclude  <j<T  F  <I>  A$n+i  if  and  only  if  it  can  conclude  both  cr<T  F  $ 
and  <t<t  F  There  arc  three  ways  in  which  this  can  lead  to  a  type  II  error: 
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1.  If  both  <r<T  $  and  cr<T  ,  (I>„+ 1  hold,  then  the  probability  of  a  type  II  error  is  Pr[(cr<T  F 

d>)  A  (<r<T  F  $n+i)  |  (cr<T  |wx  d>)  A  (<r<T  |«x  <hn+i)].  This  is  at  most  min(Pr[cr<T  F  $  |  (er<T  |«x 
T>)  A  (a<T  |«x  $n+i)],Pr[o<T  F  <Tn+i  |  (a<T  |«x  <I>)  A  (a<T  |«x  $n+i)])  by  Lemma  5.3,  which  in 
turn  is  at  most  min(/3,  (3n+i)  by  assumption. 

2.  If  ct<t  |ss  (I>  holds,  but  not  a<T  |ss  4>n+1,  then  the  probability  of  a  type  II  error  is  Pr[(<r<T  F 
T>)  A  (cj<t  F  T»n+i)  |  a<T  |«x  $]  <  min(/3, 1)  =  (3. 

3.  If  ct<t  ‘hn+i  holds,  but  not  cr<T  |=s__  T.  then  the  probability  of  a  type  II  error  is  Pr[(<r<T  h 
T>)  A  (cj<t  h  T>n+i)  |  ct<t  ^x  $n+i]  <  min(l,/3n+i)  =  (3n+1. 

We  take  the  maximum  over  the  three  cases  to  obtain  the  bound  max  i  <,;<„+ 1  □ 

Intuitively,  we  can  explain  Theorem  5.4  as  follows.  To  conclude  that  $  A  'T  does  not  hold,  we  only 
need  to  be  convinced  that  one  of  the  conjuncts  does  not  hold.  We  can  base  the  decision  for  the  conjunction 
solely  on  the  rejection  of  a  single  conjunct,  in  which  case  the  probability  of  a  type  I  error  will  be  the  same 
for  the  conjunction  as  for  the  rejected  conjunct.  We  get  nnn7GRf,?x<I,)  at  by  basing  our  decision  for  the  entire 
conjunction  on  the  conjunct  that  has  been  verified  with  the  smallest  probability  of  a  type  I  error.  To  conclude 
that  <E>  A  'T  holds,  we  must  be  convinced  that  both  conjuncts  hold.  We  get  a  type  II  error  if  at  least  one  of  the 
conjuncts  does  not  hold  and  we  accept  the  conjunction  as  true.  If  T  does  not  hold,  the  probability  of  a  type 
II  error  for  the  conjunction  is  bounded  by  the  type  II  error  probability  for  T.  If  T  does  not  hold,  the  type 
II  error  probability  for  T  bounds  the  type  II  error  probability  for  the  conjunction.  Since  we  cannot  know  if 
either  <I>  or  T  is  actually  false,  we  know  only  that  the  type  II  error  probability  is  at  most  the  maximum  of  the 
type  II  error  probabilities  for  the  conjuncts. 

If  we  knew  that  the  verification  results  for  the  individual  conjuncts  were  obtained  independently,  then 
we  could  actually  bound  the  type  I  error  probability  for  the  verification  of  the  conjunction  by  n  i£Rej($) 
but  Theorem  5.4  does  not  make  any  assumptions  regarding  independence.  For  example,  if  the  same  set  of 
sampled  trajectories  were  used  to  verify  all  of  the  conjuncts,  then  the  verification  results  for  the  individual 
conjuncts  would  not  be  independent. 

Example  5.1.  Consider  the  UTSL  formula  <1?  =  V>q^{lp\]  A  V>q .75 [<£>2]-  Let  ay  =  0.01  and  =  0.04  be 
the  error  bounds  used  to  verify  the  first  conjunct,  and  let  0:2  =  0.03  and  B2  =  0.02  be  the  error  bounds  used 
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Figure  5.1:  Probability  of  an  incorrect  verification  result  for  a  conjunction  $  =  V>  0.5  [pi]  A  V>  0.75  as  a  function 
of  the  probabilities  p\  and  P2  that  p  1  and  p%,  respectively,  hold  over  trajectories  starting  in  some  initial  state  so-  The 
error  bounds  are  on  =  0.01,  f3±  =  0.04,  0.2  =  0.03  and  (3%  =  0.02.  The  border  of  the  L-shaped  indifference  region  is 
indicated  by  dashed  lines.  The  plot  was  obtained  using  computer  simulation,  with  50,000  runs  per  data  point. 


to  verify  the  second  conjunct.  Furthermore,  let  p,  be  the  probability  measure  of  the  set  of  trajectories  that 
start  in  so  at  time  0  and  satisfy  pr,  for  i  £  {1,  2}.  We  assume  that  the  function  in  (5.1),  with  Sq  =  0.1,  is 
used  to  determine  the  indifference  region  for  each  probabilistic  operator.  This  gives  the  indifference  region 
(0.4, 0.6)  for  the  first  conjunct  and  (0.7, 0.8)  for  the  second  conjunct.  According  to  Theorem  5.4,  if  p\  >  0.6 
and  P2  >  0.8,  then  the  probability  of  rejecting  the  conjunction  as  false  is  at  most  min(ai,  0  2)  =  0.03.  On 
the  other  hand,  if  p\  <  0.4  or  P2  <  0.7,  then  the  probability  of  accepting  the  conjunction  as  true  is  at  most 
max(/7i ,  P2)  =  0.04.  Figure  5.1  plots  the  probability  of  incorrectly  verifying  the  given  conjunction  as  a 
function  of  p\  and  p2-  The  simulation  results  confirm  that,  while  the  probability  of  error  is  large  inside  of 
the  L-shaped  indifference  region,  the  error  bounds  arc  respected  outside  of  the  indifference  region. 

The  following  result  follows  immediately  from  Theorem  5.4  and  establishes  the  procedure  for  the  veri¬ 
fication  of  a  conjunction,  namely  that  we  use  the  target  error  bounds  for  the  conjunction  when  verifying  the 
individual  conjuncts. 

Corollary.  To  verify  /\r'= ,  4>,  with  type  I  error  probability  a  and  type  II  error  probability  f3,  it  is  sufficient 
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to  verify  each  conjunct 'b,  with  type  I  error  probability  a  and  type  II  error  probability  (5. 

We  have  now  shown  how  to  verify  a  UTSL  formula  without  nested  probabilistic  operators  so  that  condi¬ 
tions  (5.4)  and  (5.3)  arc  satisfied.  An  observation  is  obtained  by  generating  a  trajectory  using  discrete  event 
simulation  and  verifying  the  path  formula  over  the  generated  trajectory.  To  verify  a  negation,  we  verify  the 
negated  UTSL  formula  while  reversing  the  role  of  the  error  bounds.  A  conjunction  is  verified  by  verifying 
each  conjunct  using  the  same  error  bounds  as  intended  for  the  conjunction  (note  that  the  fact  that  we  can  use 
the  same  error  bounds  to  verify  the  individual  conjuncts  will  prove  essential  when  dealing  with  nested  prob¬ 
abilistic  operators).  For  probabilistic  operators,  we  can  use  one  of  the  acceptance  sampling  tests  described 
in  Section  2.2.  In  the  next  chapter,  we  present  empirical  results  for  our  model  checking  algorithm  using  two 
different  tests:  the  sequential  version  of  a  single  sampling  plan  and  Wald’s  sequential  probability  ratio  test. 


5.2  Model  Checking  with  Nested  Probabilistic  Operators 

In  this  section,  we  consider  UTSL  formulae  with  nested  probabilistic  operators.  If  a  path  formula  contains 
probabilistic  operators,  we  can  no  longer  assume  that  it  can  be  verified  without  error.  To  deal  with  the 
possibility  of  making  an  error  in  verifying  a  path  formula,  we  modify  the  semantics  given  in  Definition  5.1. 

Definition  5.2  (UTSL  Semantics  with  Indifference  Regions  and  Nesting).  Let  M  =  ( S ,  T,  //,  S  V ,  V) 

be  a  factored  stochastic  discrete  event  system,  and  let  5(6)  be  a  function  determining  the  half-width  of  an 
indifference  region  centered  around  6.  A  satisfaction  relation  |^T  and  an  unsatisfaction  relation  (rs  for 
UTSL  with  indifference  regions  and  nested  probabilistic  operators  are  simultaneously  defined  by  induction 
as  follows  (the  first  six  rules  arc  the  same  as  in  Definition  5.1  and  arc  therefore  not  repeated  here): 


M,a<T  V>e[p\ 

M,a<T  |psx  V>o[<p\ 

M,a<r  V<e[(f\ 

M,a<r  V<e[p\ 


if  p({a  G  Path(cr<T ) 
if  p,({a  G  Path(a<T ) 
if  p,({a  G  Path(cr<T) 
if  /i({cr  G  Path(a<T ) 


M,a,r  p})  >6  +  5(6) 
M,o,t  |«x  ¥>})  >  1  -  (6  -  5(6)) 
|«T  p})  <  6  —  5(6) 
M,o,t  |«x  p})  <  1  -  (6  +  5(6)) 
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ii  3k  G  N.((Tfc_i  <  r)  A  (r  <  Tfc)  A  (Tk  -  r€/)  A  ( M,cr<Tk  $)) 
if  Vfc  G  N.(((Tfc_!  <  r)  A  (r  <  T*)  A  (Tfc  —  r  G  /))  — >  (Ad,<r<Tfc  K  $)) 
if  3f  G  /.((Ad,o<T+t  *)  AVt'  G  T.((t'  <  t)  -►  (Af,o<T+t/  K  $))) 

if  Vf  G  I.((M,<T<r+t  V  3t'  G  T.((t'  <  t)A(M,a<T+t'  |«x  $))) 

Definition  5.2  is  equivalent  to  Definition  5.1  for  UTSL  formulae  that  do  not  have  nested  probabilistic 
operators,  so  the  semantics  just  given  can  be  used  even  without  nested  probabilistic  operators.  To  prove 
this,  we  first  show  that  for  UTSL  formulae  free  of  any  probabilistic  operators,  the  relations  ,  and  |rs  are 
equivalent  to  (=  and  \f,  respectively. 

Lemma  5.5.  If  &  is  a  UTSL  formula  that  does  not  contain  any  probabilistic  operators,  then  (Ad,  er<T  |«T 
4>)  4=>-  (Ad ,  cr<T  |=  <h)  and  (Ad,cr<r  |wx  <b)  (Ad,cr<r  <J>). 

Proof  by  structural  induction.  It  <b  is  x  ~  v.  then  the  two  equivalences  follow  immediately  from  Defini¬ 
tions  4.2  and  5.2.  If  the  UTSL  formula  is  -i<f>  or  <b  A  T  where  <b  and  T  arc  free  of  any  probabilistic  operators, 
assume  that  the  equivalences  hold  for  and  T.  It  follows  from  Definitions  4.2  and  5.2  that  the  equivalences 
hold  for  the  compound  UTSL  formulae.  This  covers  all  UTSL  formulae  that  can  be  formed  without  any 
probabilistic  operators  according  to  Definition  4.1.  □ 

Proposition  5.6.  For  UTSL  formulae  that  do  not  contain  nested  probabilistic  operators,  Definitions  5.1  and 
5.2  are  equivalent. 

Proof.  The  first  six  rules  arc  identical  for  the  two  definitions.  It  follows  from  Lemma  5.5  that  the  rules 
for  path  formulae  arc  equivalent  to  the  rules  in  Definition  4.2,  which  Definition  5.1  inherits,  because 
the  path  formulae  arc  assumed  not  to  contain  probabilistic  operators.  From  this,  it  follows  that  the  sets 
{cr  G  Path(o<T)  \  |«t  p}  and  {a  G  Path(cr<T)  \  Ad,cr,  r  |=  p]  are  equivalent.  The  rules 

for  Ad.  cr<T  |=aT  V^e[p\  are  therefore  equivalent  for  the  two  definitions.  Analogously,  the  sets  {<r  G 
Path(cr<T)  |  Ad,  a,  r  |«x  p\  and  {a  G  Path(o<T)  \  A4,a,  r  p\  are  equivalent.  Since  Ad,  <7,  r  p 
is  equivalent  to  — > (Ad ,  cr,  r  |=  p),  we  have  p({c r  G  Path(cr<T)  \  M,o,t  |«x  p\)  =  1  —  p({a  G 
Path(cr<T )  |  Ad,  cr,  t  |=  p}).  Hence,  the  rules  for  A d,<r<r  |wx  e[p\  arc  also  equivalent  for  the  two 
definitions,  and  this  covers  all  rules.  □ 


Ad,cr,r  |wT  X1  4* 
Ad,cr,r  |wx  X1  4> 
M,a,r  &IL1 
|wx 
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It  is  still  the  case  that  A4,  er<r  |«T  implies  A4,  cr<T  |=  <f>  and  A4,  a<T  4>  implies  A4,  er<r  \£  4>. 
We  want  our  model  checking  algorithm  to  satisfy  conditions  (5.4)  and  (5.3)  for  the  modified  definition  of 
the  relations  |ss  and  |~_  for  UTSL  formulae.  Negation  and  conjunction  can  be  handled  in  the  same  way  as 
before,  because  the  definition  is  unmodified  in  these  cases,  but  probabilistic  statements  must  now  be  handled 
differently. 

5.2.1  Probabilistic  Operator 

Consider  the  UTSL  model  checking  problem  (A4,  <7<r,  V>  e[p\)  (the  case  T<e[<p\  is  analogous),  and  let 
p  =  p({cr  G  Path(cr<T )  |  Ai,cr,  r  |«T  tp})  and  q  =  p({o  G  Path(a<T )  |  At,  a,  r  |«x  <£>}).  The 
two  conditions  (5.4)  and  (5.3)  can  then  be  expressed  as  Pr[cr<T  Y  T>e{p\  \  P  >  0  +  5(0)]  <  a  and 
Pr[<j<T  h  V>o[ip\  |  q  >  1  —  (0  —  5(0))]  <  (3,  respectively.  If  these  conditions  ai-e  satisfied,  then  we  accept 
V>  e  [<p\  with  probability  at  least  1  —  a  if  p  >  0  +  5(0)  and  with  probability  at  most  (3  if  q  >  1  —  (0  —  5(0)). 
If  p  >  0  +  5(0),  then  p({cr  G  Path(a<T )  |  A4,a,r  \=  <p})  >  0  definitely  holds,  so  there  is  a  high 
probability  of  accepting  V>  $ [p\  when  it  holds  with  some  margin.  Conversely,  if  q  >  1  —  (0  —  5(0)),  then 
h{{cf  G  Path(cr<T)  \  A4,  a,  r  |=  ip})  <  0  definitely  holds,  so  V>  o [<p\  is  rejected  with  high  probability  when 
it  is  false  with  some  margin. 

We  want  to  use  acceptance  sampling,  as  before,  to  verify  probabilistic  statements.  With  probabilistic 
operators  in  the  path  formulae,  it  is  possible  that  observations  we  use  for  the  acceptance  sampling  test  arc 
incorrect.  If  we  can  at  least  bound  the  probability  of  a  path  formula  being  incorrectly  verified,  then  we  can 
modify  the  acceptance  sampling  test  to  account  for  the  possibility  of  observation  errors.  In  particular,  we 
assume  that  Pr[cr,  r  Y  ip  \  a,  r  ip\  <  a'  and  Pr[<r,  r  h  ip  \  a,  r  |«x  ip\  <  (3 '  for  some  a'  and  (3' .  To 
understand  the  general  theoretical  results  presented  below  regarding  acceptance  sampling  with  observation 
error,  it  may  help  to  have  the  following  interpretation  for  the  random  variables  X  and  Y  in  mind: 

Y  =  1  <t=>-  A4,a,  r  h  cp  X  =  1  A4,<J ,t  <p 

Y  =  0  Ai,  <7,  r  Y  ip  X  =  0  A4,  a,  r  tp 

Note  that  Y  has  exactly  two  outcomes  and  is  therefore  a  Bernoulli  variate,  but  X  can  have  more  than  two 
outcomes.  Before  establishing  a  modified  acceptance  sampling  test,  we  need  the  following  intermediate 
result  regarding  two  arbitrary  random  variables  X  and  Y  with  some  correlation  between  their  observations. 
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Lemma  5.7.  Let  X  and  Y  be  two  random  variables  such  that  Pr[Y  =  0  |  X  =  1]  <  a'  and  Pr[Y  = 
1  |  X  =  0]  <  P'.  IfPr[X  =  1]  =  p  and  Pr[A  =  0]  =  q,  then  p(  1  —  a')  <  Pr[Y  =  1]  <  1  —  <gr(l  —  /?'). 

Proof.  By  the  formula  of  total  probability  we  have 

Pr[Y  =  1]  =  Pr[X  =  1]  Pr[Y  =  1  |  X  =  1]  +  Pr[X  =  0]  Pr[Y  =  1  |  X  =  0] 

+  Pr[X  $  {0, 1}]  Pr[Y  =  1  |  X  £  {0, 1}] 

=  p(l  -Pr[Y  =  0  |  X  =  l])+qPr[Y  =  1  \  X  =  0]  +  (1  -p-q)Pr[Y  =  1  |  X  {0, 1}]  . 

As  an  upper  bound  for  Pr[Y  =  1],  we  get  Pr[Y  =  1]  <  p(  1  —  0)  +  q(3'  +  (1  —  p  —  q)  -1  =  \  —  q{l  —  f3').  The 
lower  bound  for  Pr[Y  =  1]  is  derived  as  follows:  Pr[Y  =  1]  >  p(l—a')+q-0+(l—p—q)-0  =  p(l— c/).  □ 

We  can  now  show  that  with  the  observation  error  bounded  by  a1  and  P\  it  is  sufficient  to  replace  the 
probability  thresholds  po  and  Pi  of  a  standard  acceptance  sampling  test  with  po(l  —  a1)  and  1  —  ( 1  —  pi )  ( 1  — 
ft' ) ,  respectively.  In  effect,  this  means  that  we  narrow  the  indifference  region  for  the  acceptance  sampling 
test  in  order  to  cope  with  the  possibility  of  inaccurate  observations. 

Theorem  5.8  (Acceptance  Sampling  with  Observation  Error).  Let  Y  be  a  Bernoulli  variate  whose  ob¬ 
servations  are  related  to  the  observations  of  a  random  variable  X  as  follows:  Pr[Y  =  0  |  X  =  1]  <  a'  and 
Pr[Y  =  1  |  X  =  0]  <  P'.  Furthermore,  let  Pr[X  =  1]  =  p,  Pr[A'  =  0]  =  q,  and  Pr[Y  =  1]  =  p'.  To  test 
the  hypothesis  H$  :  p  >  po  against  the  alternative  hypothesis  H  \  :  q  >  1  —  p\,  for  probability  thresholds 
po  >  p\,  so  that  the  probability  of  accepting  II  \  when  Hq  holds  (type  I  error)  is  at  most  a  and  the  proba¬ 
bility  of  accepting  Hq  when  II\  holds  (type  II  error)  is  at  most  P,  it  is  sufficient  to  test  //,/)  :  p'  >  po(l  —  a') 
against  II \  :  p'  <  1  —  (1  —  pi)(l  —  P')  with  probability  at  most  a  that  II \  is  accepted  when  H0  holds 
and  probability  at  most  P  that  ff'{]  is  accepted  when  II  \  holds,  provided  that  acceptance  of  H'0  leads  to 
acceptance  of  Hq  and  acceptance  of  H[  leads  to  acceptance  of  Hi. 

Proof.  From  (2.3),  assuming  a  single  sampling  plan  (n,  c)  is  used,  we  get  F(c\  n.  p’)  as  the  probability 
of  accepting  hypothesis  H[.  We  know  from  Lemma  5.7  that  //  >  p(\  —  a').  Since  F(c:  n,  p)  is  a  non¬ 
increasing  function  of  p  in  the  interval  [0, 1],  we  have  F(c;  n,p ')  <  F(c\  n,p{  1  —  a')),  which  if  Hq  :  p  >  po 
holds  is  at  most  F(c;  n,po(l  —  a')).  By  choosing  n  and  c  so  that  F(c;  n,po(l  —  a'))  <  a,  we  ensure  that 
the  probability  of  accepting  H[,  and  therefore  also  II \,  is  at  most  a  when  Hq  holds. 
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The  probability  of  accepting  Hq  is  1  —  F(c;  n.  p'  )  when  using  the  single  sampling  plan  (n,  c ).  It  follows 
from  Lemma  5.7  that  p'  <  1  —  q(l  —  /?').  Thus,  1  —  F(c\  n,p')  <  1  —  F(c;  n,  1  —  q(  1  —  /?')),  which  in  turn 
is  at  most  1  —  F(c;  n,  1  —  (1  —  pi)(l  —  /3'))  if  Pi  :  q  >  1  —  p\  holds.  Consequently,  if  we  choose  n  and  c 
so  that  1  —  F(c ;  n,  1  —  (1  —  pi)(l  —  /T))  <  (3,  we  arc  guaranteed  that  the  probability  of  accepting  Hf  and 
therefore  also  Hih  is  at  most  / 3  when  H  \  holds.  □ 

The  above  proof  establishes  Theorem  5.8  specifically  for  single  sampling  plans,  but  the  result  is  more 
general  because  we  only  need  to  modify  the  probability  thresholds  in  order  to  cope  with  observation  er¬ 
ror  while  leaving  the  rest  of  the  test  procedure  intact.  We  can  use  the  exact  same  modification  for  other 
acceptance  sampling  tests,  for  example  Wald’s  sequential  probability  ratio  test.  Note  that  the  probability 
thresholds  equal  p$  and  p\  if  the  observation  error  is  zero,  so  the  modified  test  is  identical  to  the  original  test 
in  that  case,  as  should  be  expected.  Note  also  that  the  observation  error  can  be  chosen  independently  of  the 
desired  strength  of  the  test.  A  procedure  for  verifying  probabilistic  UTSL  formulae  with  nested  probabilistic 
operators  follows  immediately  from  Theorem  5.8. 

Corollary.  An  acceptance  sampling  test  with  strength  (a,  3)  and  probability  thresholds  (9  +  <5(#))(1  —  a') 
and  1  —  (1  —  (9  —  <5(#)))(1  —  (3 ')  can  be  used  to  verify  V>e[p\  with  type  I  error  probability  a  and  type 
II  error  probability  (3,  provided  that  p  can  be  verified  over  trajectories  with  type  I  error  probability  a'  and 
type  II  error  probability  (3'. 

To  better  understand  the  verification  procedure  for  the  UTSL  formula  V>  q  [p\  with  nested  probabilistic 
operators,  consider  the  following  four  sets  of  trajectories: 

P  =  {cr  £  Path(a<r )  |  A4,a,  r  |=  p}  Q  =  {cr  e  Path(a<T )  |  A4,  o,  r  p} 

P  =  {o  £  Path(o<T )  |  A4,  a,  r  p]  Q  =  {cr  G  Path(a<T)  \  Ad,  cr,  r  p] 

We  cannot  determine  membership  in  P  or  Q  for  a  sampled  trajectory  a  £  Path(o<T)  if  p  contains  proba¬ 
bilistic  operators.  We  assume,  however,  that  we  have  a  probabilistic  procedure  for  determining  membership 
in  P  or  Q.  We  require  a  probability  of  at  most  a'  that  o  is  determined  to  be  in  Q  if  it  is  really  in  f  \  and  the 
probability  of  determining  that  cr  is  in  P  should  be  at  most  3'  if  cr  is  actually  in  Q.  Given  such  a  procedure. 
Theorem  5.8  provides  us  with  a  way  to  test  the  hypothesis  Hq  :  n(P)  >  9  +  5(9)  against  the  alternative 
hypothesis  H\  :  p(Q)  >  1  —  (9  —  5(9)).  Acceptance  of  Hq  leads  to  acceptance  of  V>e\p\  as  true,  and 
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Pr[accept  "ju(P)  <  6"]>l  -  J3  Prf  accept  "  p(P)  >  9"]  >  1  —  a 


Figure  5.2:  Probabilistic  guarantees  for  model  checking  problems  with  UTSL  formulae  of  the  form  V>e[p\,  with 
probabilistic  operators  in  ip.  The  thick  box  represents  all  such  model  checking  problems.  In  the  right  half  are  problems 
with  an  affirmative  answer.  A  subset  of  these  problems  have  an  affirmative  answer  even  with  an  indifference  region 
at  the  top  level  of  half-width  5(9).  For  some  of  the  latter  set  of  problems,  the  UTSL  formula  holds  with  indifference 
regions  at  all  levels.  It  is  for  this  last  set  of  problems  that  we  can  guarantee  an  affirmative  answer  with  probability  at 
least  1  —  a.  There  is  a  similar  hierarchy  for  the  problems  with  a  negative  answer,  in  the  left  half  of  the  thick  box.  The 
gray  area  represents  the  set  of  model  checking  problems  for  which  we  give  no  correctness  guarantees. 

acceptance  of  H \  leads  to  rejection  of  V>g[(p\  as  false.  We  are  guaranteed  that  is  accepted  with  proba¬ 
bility  at  least  1  —  a  if  Ho  holds.  Since  P  C  P,  we  know  that  p(P)  >  9  when  Ho  holds,  so  there  is  a  high 
probability  of  accepting  V>e[<p\  when  it  holds  with  some  margin.  We  also  know  that  H\  is  accepted  with 
probability  at  least  1  —  /7  if'  // 1  holds,  and  p(P)  <  9  in  that  case,  so  there  is  a  high  probability  of  rejecting 
V>  $  [ip]  when  it  is  false  with  some  margin. 

Figure  5.2  gives  a  graphical  representation  of  the  correctness  guarantees  provided  by  the  algorithm  for 
UTSL  formulae  with  nested  probabilistic  operators.  For  the  subset  of  all  model  checking  problems  such  that 
l-i(P)  >  9  +  6(9),  it  is  guaranteed  that  an  affirmative  answer  is  given  with  probability  at  least  1  —  a.  For  the 
problems  such  that  1  —  p(Q)  <  9  —  6(9),  it  is  guaranteed  that  a  negative  answer  is  given  with  probability  at 
least  1  —  (3.  For  the  remaining  problems,  no  guarantees  are  made  regarding  the  correctness  of  the  result. 

5.2.2  Path  Formulae  with  Probabilistic  Operators 

We  have  established  a  procedure  for  verifying  probabilistic  statements  when  the  path  formula  cannot  be 
verified  without  some  probability  of  error.  It  remains  to  show  how  to  verify  path  formulae  containing 
probabilistic  operators  so  that  the  following  conditions  are  satisfied: 


(5.5) 

Pr[cr,  t  P  <p  \  a,r  \ 

«T  P\  <  OL 

(5.6) 

Pr[<7,  r  F  ip  |  <7,  r  \ 

~-L  P\  <  P' 

This  is  straightforward  for  X 1  <F.  We  simulate  a  single  state  transition  and  verify  <h  in  the  resulting  state. 
Path  formulae  of  the  form  F  U1  F  are  more  difficult  to  handle.  We  need  to  find  a  t  e  /  such  that  F  is 
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satisfied  at  time  t  and  <f>  is  satisfied  at  all  time  points  t!  prior  to  t.  Examples  4. 1  and  4.2  showed  that  it  is  not 
sufficient  to  consider  only  the  time  points  at  which  a  state  transition  occurs  for  models  that  do  not  satisfy  the 
Markov  property.  However,  if  the  model  is  a  Markov  process,  then  it  is  sufficient  to  consider  the  time  points 
at  which  state  transitions  occur,  as  mentioned  in  Chapter  4.  This  is  guaranteed  to  be  a  finite  number  of  time 
points  if  the  assumptions  of  Theorem  5.1  are  satisfied.  The  same  can  be  said  for  any  discrete-time  model, 
provided  that  sup  I  is  finite.  If  only  a  finite  number  of  time  points  need  be  considered,  then  we  can  treat  the 
verification  of  <J>  U1  T  as  a  large  disjunction  of  conjunctions.  Let  {fi, . . . ,  tn }  be  the  set  of  time  points  at 
which  we  may  have  to  verify  the  subformulae,  with  ti  <  sup  I.  For  Markov  processes,  these  arc  the  time 
points  at  which  state  transitions  occur,  and  for  discrete-time  models  these  arc  all  time  points  no  later  than 
sup  I.  Furthermore,  let  tn+  \  be  some  time  point  later  than  sup  I.  We  can  verify  <I>  U1  T  as  follows: 

n  , 

(T,  t  h  4>  U1  T'  if  \J  (  {ti  >  r)  A  ([ti,  ti+ 1)  n  I  ±  0)  A  (<r<ti  I-  \h) 

i= 1  '■ 

r-t  \ 

A  ((ti  €  /)  V  (a<u  h  $))  A  /\ (a<t .  h  $)  J 

3= i  J 

Since  disjunction  can  be  expressed  using  conjunction  and  negation,  and  we  already  know  how  to  verify 
negations  and  conjunctions  using  statistical  techniques,  this  gives  us  a  way  to  verify  4>  U1  T1  so  that  condi¬ 
tions  (5.5)  and  (5.6)  are  satisfied.  Thus,  it  is  sufficient  simply  to  verify  the  UTSL  formulae  4?  and  'h  with 
error  bounds  a'  and  f3'  at  each  relevant  time  point  along  a  trajectory. 

5.2.3  Observation  Error 

A  noteworthy  consequence  of  Theorem  5.8  is  that  the  bounds  on  the  observation  error,  a!  and  ff ,  can  be 
chosen  independently  of  the  bounds  on  the  probability  of  a  verification  error  occurring,  a  and  (3 .  We  can 
decrease  a '  and  ff  to  increase  the  indifference  region  of  the  outer  probabilistic  statement  and  therefore 
lower  the  sample  size  required  to  verify  this  paid  of  the  formula,  but  this  will  increase  the  effort  required  per 
observation,  since  we  have  to  verify  the  nested  probabilistic  statements  with  higher  accuracy.  If  we  increase 
a'  and  f3 '  to  lower  the  effort  per  observation,  then  we  need  to  make  more  observations.  Clearly,  there  is  a 
tradeoff  here,  and  the  choice  for  the  bounds  on  the  observation  error  can  have  a  great  impact  on  performance. 

Ideally,  we  want  to  use  the  observation  error  that  minimizes  the  expected  verification  effort,  but  this 
quantity  is  non-trivial  to  compute.  We  propose,  instead,  a  heuristic  estimate  of  the  verification  effort  that 
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can  be  computed  efficiently. 

Definition  5.3  (Estimated  Effort  Heuristic).  Let  n(po,pi,a,  (3)  be  the  expected  sample  size  of  an  ac¬ 
ceptance  sampling  test  with  strength  ( a ,  (3)  for  probability  thresholds  p0  and  p\,  and  let  q  be  the  expected 
number  of  state  transitions  within  a  unit  interval  of  time.  We  define  a  heuristic  estimate  of  the  effort  required 
to  verify  a  UTSL  formula  inductively  as  follows: 

effort(x  ~  v ,  a,  (3,  a',  (3')  =  1 

effort (-<$,  a,  (3,  a',  f3')  =  effort (<f>,  a,  (3,  a',/3') 
effort (<3?  A  'I',  a,  (3,  a' ,  (3')  =  effort (<f>,  a,  (3,  o! ,  (3')  +  effort (dr,  a,  (3 ,  a',  (3') 
effort(V^e[p\,  a,  P,  a' ,  (3')  =  n{{6  +  <5(0))(1  -  a'),  1  -  (1  -  (0  -  <5(0)))(1  -  /3'),a,/3 ) 

•  min  effort  (ip,  a',/3',  a",  /3") 

a"  ,/j" 

effort (X1  <f>,a,/3,a',/3')  =  effort ($,  a,  /3,  a' ,  /3') 
effort ($  U1  'L,  a,  /3,  a',  (3')  =  q  ■  sup  /  •  effort^,  a,  (3,  a',/3') 

+  g  •  (sup  /  —  inf  I)  ■  effort (H/, a, /3, a',/3') 


For  discrete-time  models,  we  set  g  to  1,  and  for  continuous-time  Markov  processes,  q  can  be  set  to  the 
maximum  exit  rate  of  the  model.  The  quantity  n(po,pi,  a,  /3)  depends  on  the  acceptance  sampling  test  that 
is  used  to  verify  probabilistic  properties.  If  we  use  a  single  sampling  plan,  then  we  can  compute  n  exactly 
using  Algorithm  2.1  or  approximately  using  (2.8).  Estimating  the  effort  of  verifying  the  UTSL  formula 
e  [aI  when  using  a  sequential  sampling  plan  is  trickier  because  the  expected  sample  size  is  a  function  of 
the  unknown  probability  measure  p  of  the  set  of  trajectories  satisfying  (p.  It  may  be  reasonable  to  minimize 
the  worst-case  estimated  effort.  For  Wald’s  sequential  probability  ratio  test,  we  can  use  the  value  of  Ep  for 
s  given  in  Table  2.3. 

The  observation  error  can  obviously  not  be  set  to  zero,  but  there  is  an  upper  bound  as  well  because  the 
width  of  the  indifference  region  for  an  acceptance  sampling  test  must  be  positive.  In  the  case  of  acceptance 
sampling  with  observation  error,  the  condition  1  —  (1  —  p\  )(  1  —  /?')  <  yy, ( 1  —  a')  must  be  satisfied.  From 
this  condition,  we  can  derive  an  upper  bound  on  the  symmetric  observation  error  (a'  =  /3'): 


(5.7)  1  -  (1  -pi)(l  -  a')  <  p0(l  -  a')  =^-  1  <  (1  +  p0  -  pi)(l  -  a')  =>  a'  < 


Po  ~P  1 
1  +  (po  ~  Z>1) 
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The  difference  po  —  p\  is  the  intended  width  of  the  indifference  region  with  zero  observation  error,  which  in 
our  case  equals  2 6(9).  We  can  therefore  write  (5.7)  as  a!  <  5(9) / (0.5  +  5(9)). 

Example  5.2.  With  po  =  0.91  and  p\  =  0.89,  the  maximum  symmetric  observation  error  is  0.02/1.02  ~ 
0.0196  according  to  (5.7).  This  means  that  the  probability  of  error  must  be  no  more  than  0.0196  for  each 
individual  observation  when  using  an  indifference  region  of  width  0.02. 

We  can  find  the  optimal  symmetric  observation  error  for  each  probabilistic  operator  of  a  UTSL  formula 
using  numerical  function  minimization  and  systematically  working  our  way  outward  from  the  innermost 
probabilistic  operators.  For  the  innermost  probabilistic  operators,  we  can  use  zero  observation  error  be¬ 
cause  their  path  formulae  do  not  contain  any  probabilistic  operators.  We  can  find  the  optimal  symmetric 
observation  error  for  the  remaining  probabilistic  operators  by  searching  for  the  value  of  x  =  o'  =  f3'  that 
minimizes  effort (V^ o [p>\ ,  a,  /?,  x,  x)  in  the  interval  (0, 5(9) /( 0.5  +  <)(#))).  A  lower  effort  could,  conceiv¬ 
ably,  be  achieved  with  an  asymmetric  observation  error,  but  it  would  require  optimization  in  two  dimensions 
to  find  the  asymmetric  observation  error  with  minimal  estimated  effort. 

Example  5.3.  Consider  the  UTSL  formula  4>  =  V>  0.9  [X  V>  0.35 [A  a;=l]] ,  and  assume  that  we  use  (5.1) 
with  ()'o  =  0.05  to  determine  the  width  of  indifference  regions.  This  gives  us  the  probability  thresholds 
Po  =  0.91  and  p\  =  0.89  for  the  outer  probabilistic  operator,  and  p'{)  =  0.865  and  p\  =  0.835  for  the  inner 
probabilistic  operator.  Furthermore,  assume  that  we  want  to  verify  4>  with  error  bounds  a  =  (3  =  0.01. 
Using  Definition  5.3  and  assuming  symmetric  observation  error,  we  estimate  the  effort  of  verifying  <I>  as  the 
product  of  n(po  ■  (1  —  a'),  1  —  (1  —  pi)(l  —  a'),  a,  (3)  and  n(p,0.p\ .  a' .  a').  Figure  5.3  plots  the  two  factors 
of  the  total  estimated  effort  separately  for  a  single  sampling  plan.  This  choice  of  sampling  plan  means  that 
the  estimated  effort  is  equal  to  the  actual  effort.  The  dotted  line  indicates  the  upper  bound  on  the  symmetric 
observation  error:  0.02/1.02  «  0.0196.  The  total  effort  is  plotted  in  Figure  5.4.  The  effort  is  minimal  at 
a'  =  (3'  0.00153,  which  therefore  is  the  optimal  symmetric  observation  error  for  a  single  sampling  plan. 

5.2.4  Memoization 

Statistical  verification  of  UTSL  formulae  with  nested  probabilistic  operators  can  be  rather  costly  because 
each  observation  for  the  outermost  probabilistic  operator  involves  at  least  one  acceptance  sampling  test. 
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Figure  5.3:  Heuristic  estimate,  as  a  function  of  the 
symmetric  observation  error  a',  of  the  effort  needed  for 
the  verification  of  the  inner  (dashed  curve)  and  outer 
(solid  curve)  probabilistic  operators  of  the  UTSL  formula 

V>0.9[X  V>0.85[X  x=l}\. 


Figure  5.4:  Total  heuristic  estimated  effort,  as  a  func¬ 
tion  of  the  symmetric  observation  error  a',  for  the  UTSL 
formula  V>  0.9  [X  P>  0.85pf  *=1]]  ■ 


When  the  path  formula  is  <L  U!  T  with  $  or  T  being  probabilistic  statements,  then  each  observation  may 
require  acceptance  sampling  to  be  performed  for  every  state  visited  along  a  trajectory  before  time  sup  I.  We 
can  improve  performance  radically  through  the  use  of  memoization  (Michie  1968).  This  means  that  each 
component  of  a  path  formula  is  verified  only  once  in  a  specific  state. 

Memoization  does  not  affect  the  validity  of  the  verification  result,  since  a  time -bounded  until  formula 
can  be  treated  as  a  large  conjunction,  and  we  have  noted  that  Theorem  5.4  does  not  require  conjuncts  to  be 
verified  independently.  Thus,  we  can  ensure  error  bounds  a'  and  /3'  for  each  observation  even  if  we  reuse 
verification  results  along  a  sample  trajectory.  It  is  also  safe  to  reuse  memoized  results  across  observations. 
If  we  ensure  that  each  trajectory  is  an  independent  sample,  each  observation  will  be  independent  as  well. 
This  means  that  each  nested  probabilistic  statement  needs  to  be  verified  only  once  per  unique  visited  state. 


5.3  Distributed  Acceptance  Sampling 


Statistical  solution  methods  that  use  samples  of  independent  observations  arc  trivially  parallelizable.  We 
can  use  multiple  computers  to  generate  the  observations,  as  noted  already  by  Metropolis  and  Ulam  (1949, 
p.  340),  and  expect  a  speedup  linear  in  the  added  computing  power.  We  must,  of  course,  ensure  that  the 
observations  generated  by  different  machines  arc  indeed  independent,  and  this  requires  extra  care  when 
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Figure  5.5:  Master/slave  architecture  and  communication  protocol  for  distributed  acceptance  sampling. 


initializing  the  pseudorandom  number  generators  on  each  machine.  It  may  not  be  sufficient  simply  to  use 
a  different  seed  on  each  machine,  because  the  seed  determines  only  the  start  of  a  sequence  of  numbers  and 
does  not  alter  the  way  in  which  these  numbers  are  generated.  Independence  can  be  assured,  for  example,  if 
we  use  the  scheme  proposed  by  Matsumoto  and  Nishimura  (2000),  which  encodes  a  process  identifier  into 
the  pseudorandom  number  generator.  This  effectively  creates  a  new  pseudorandom  number  generator  for 
each  unique  identifier  rather  than  a  different  segment  of  a  sequence  from  the  same  generator,  as  is  the  case 
when  only  the  seed  is  varied. 

It  is  natural  to  adopt  a  master/slave  architecture  (Figure  5.5)  for  the  distributed  verification  task.  One 
or  more  slave  processes  register  their  ability  to  generate  observations  with  a  single  master  process.  The 
master  process  collects  observations  from  the  slave  processes  and  performs  an  acceptance  sampling  proce¬ 
dure.  Independent  observations  can  be  generated  by  separate  slave  processes,  running  on  different  nodes 
of  a  computer  network  or  multiprocessor  machine,  without  the  need  for  communication  between  the  slave 
processes.  Each  slave  process  is  assigned  a  unique  identifier  by  the  master  process  to  ensure  that  the  slave 
processes  use  different  pseudorandom  number  generators.  After  the  initial  communication  to  register  the 
slave  process  with  the  master  process  and  inform  the  slave  process  of  its  identifier  and  the  model  it  should 
use,  the  only  communication  required  is  a  single  bit  from  a  slave  process  to  the  master  process  for  each  ob¬ 
servation  that  is  generated.  The  right  side  of  Figure  5.5  illustrates  a  typical  communication  session  between 
slave  and  master  processes. 
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Figure  5.6:  Discrete-time  Markov  process  used  to  illustrate  risk  of  bias  in  distributed  sampling. 

5.3.1  Unbiased  Distributed  Sampling 

When  using  distributed  sampling  with  a  sequential  test,  such  as  Wald’s  sequential  probability  ratio  test,  it  is 
important  not  to  introduce  a  bias  against  observations  that  take  a  longer  time  to  generate.  For  UTSL  model 
checking,  each  observation  involves  the  generation  of  a  trajectory  prefix  through  discrete  event  simulation 
and  the  verification  of  a  path  formula  over  the  generated  trajectory  prefix.  If  we  were  simply  to  use  ob¬ 
servations  as  they  became  available,  we  could  easily  end  up  violating  the  probabilistic  guarantees  of  the 
acceptance  sampling  test  as  specified  by  the  parameters  a,  (3,  and  S.  This  is  illustrated  by  the  following 
example. 

Example  5.4.  Consider  the  discrete-time  Markov  process  shown  in  Figure  5.6  and  assume  that  we  want  to 
verify  the  UTSL  property  V>o.g[x<n  U  x<Q)  in  the  state  satisfying  ./;=().  Note  that  sample  trajectories 
starting  in  the  state  with  x=0  and  satisfying  the  path  formula  x<n  U  x<()  involve  a  single  transition,  while 
sample  trajectories  not  satisfying  the  path  formula  involve  n  transitions.  Thus,  while  the  property  actually 
holds  with  probability  p,  the  effort  required  to  produce  a  negative  observation  is  roughly  n  times  as  high 
as  to  produce  a  positive  observation.  If  we  use  m  slave  processes  to  generate  observations,  and  ignore 
communication  overhead,  we  can  expect  to  see  ^''= m.p1  =  mp(  1  —  pn~ 1 ) / (1  —  p)  positive  observations 
before  seeing  a  negative  observation.  If,  instead,  we  generate  the  observations  with  a  single  process,  the 
expected  number  of  positive  observations  before  the  first  negative  observation  is  *F*(1  —p)  =  p/(l  — 
p).  These  numbers  differ  by  a  factor  of  mfl  —  pn~ 1 ) .  Figure  5.7  shows  how  this  can  introduce  bias  in  the 
analysis,  leading  to  an  acceptance  sampling  test  with  a  probability  of  accepting  the  hypothesis  Hq  \  p  >  p$ 
that  varies  significantly  with  m. 

This  bias  is  avoided  by  committing,  a  priori,  to  the  order  in  which  observations  will  be  taken  into 
account.  This  can  be  accomplished,  for  example,  by  processing  observations  in  cyclic  order.  Thus,  if  slave 
process  0  produces  two  observations  before  slave  process  1  produces  a  single  observation,  the  master  process 
waits  for  an  observation  from  slave  process  1  before  processing  the  second  observation  from  slave  process 
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Figure  5.7:  Probability  of  accepting  V>o.g[x<n  U  for  distributed  acceptance  sampling  with  m  machines  and 
using  observations  immediately  as  they  arrive. 

0.  Observations  that  are  received  out-of-order  are  buffered  until  it  is  time  to  process  them. 

The  cyclic  scheme  works  well  if  the  slave  processes  are  executed  on  homogeneous  nodes.  In  a  hetero¬ 
geneous  environment,  however,  a  pure  cyclic  scheme  will  not  take  full  advantage  of  the  available  computa¬ 
tional  resources.  In  the  same  amount  of  time,  a  slave  process  running  on  a  fast  machine  will  generate,  on 
average,  more  observations  than  a  process  running  on  a  slow  machine.  The  cyclic  scheme,  however,  will 
use  the  same  number  of  observations  from  both  slave  processes.  As  a  result,  a  potentially  large  fraction  of 
the  observations  generated  on  the  faster  machine  will  go  to  waste  and  the  speedup  will  therefore  not  be  as 
large  as  one  would  expect  from  the  added  computing  power. 

To  address  this  problem,  we  can  maintain  a  dynamic  schedule,  instead  of  a  static  schedule,  of  the  order 
in  which  observations  arc  processed.  At  the  beginning,  we  schedule  to  receive  one  observation  from  each 
slave  process  in  a  specific  order.  When  an  observation  arrives  from  slave  process  i,  we  insert  i  at  the  end 
of  the  current  schedule,  leaving  two  entries  for  i  in  the  schedule.  We  then  check  if  i  is  at  the  front  of  the 
schedule,  in  which  case  we  immediately  process  the  observation  and  pop  i  from  the  front.  Otherwise,  we 
buffer  the  observation  for  later  use.  At  the  removal  of  an  item  from  the  front  of  the  schedule,  we  check 
to  see  if  there  is  a  buffered  observation  for  the  new  front  item.  We  keep  processing  buffered  observations, 
removing  the  front  item  of  the  schedule  for  each  processed  observation,  until  the  front  item  has  no  buffered 
observations. 
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By  rescheduling  the  processing  of  the  next  observation  for  a  slave  process  at  the  arrival  of  an  observation, 
we  get  a  schedule  that  automatically  adjusts  to  variations  in  performance  of  slave  processes.  If  we  have  two 
slave  processes,  with  process  0  running  on  a  machine  that  is  twice  as  fast  as  the  machine  that  process  1  is 
running  on,  then  the  adaptive  schedule  will  lead  to  us  processing,  on  average,  twice  as  many  observations 
from  process  0  as  from  process  1.  This  happens  automatically,  without  the  need  for  explicit  communication 
of  performance  characteristics  of  the  nodes  on  which  slave  processes  arc  running. 

5.3.2  Out-of-Order  Observations 

With  the  adaptive  ordering  of  observations,  we  arc  guaranteed  linear  speedup,  at  least  in  the  limit.  We  can 
potentially  do  even  better  by  processing  out-of-order  observations  as  they  arrive,  although  of  course  not  in 
the  naive  way  that  has  already  been  shown  to  introduce  bias  against  long  sample  trajectories. 

Recall  from  Section  2.2.3  that  the  first  m  observations  x\, . . . ,  xrn  can  be  summarized  with  the  statistic 
dm  =  !C"=i  xi>  and  that  a  sequential  acceptance  sampling  test  can  be  carried  out  by  comparing  drn  at  each 
stage  to  an  acceptance  number  am  and  a  rejection  number  rm.  Assume  that  we  have  processed  m  in-order 
observations  when  observation  xi  arrives.  We  proceed  as  usual  if  l  =  m  +  1,  but  we  want  to  take  the 
observation  into  account  immediately  even  if  l  >  m  +  1  instead  of  waiting  until  after  we  have  received 
observations  xm+i  through  x/_.  1 .  This  can  be  done,  without  altering  the  probability  of  accepting  Hq  for 
the  acceptance  sampling  test,  by  computing  lower  and  upper  bounds  for  dm+  \  through  di.  We  define  the 
following  quantities: 

{Xi  if  X{  has  been  received  I  Xi  if  Xi  has  been  received 

xi  =  < 

0  otherwise  I  1  otherwise 

The  lower  bound  for  d,  is  d,  =  Yl}= 1  xj  and  the  upper  bound  is  d,  =  Yl)= 1  xj-  Wc  can  accept  Hq  at  stage 
l  if  d[  >  ai  and  dn  >  77  for  all  i  <  l.  The  second  condition  prevents  us  from  accepting  Hq  at  a  stage  if  it 
is  still  possible  that  Hi  could  be  accepted  at  an  earlier  stage.  If  we  were  to  ignore  this  condition,  then  we 
could  end  up  with  a  biased  acceptance  sampling  test  again.  The  conditions  for  acceptance  of  Hi  at  stage  l 

is  d/  <  ri  and  di  <  ai  for  all  i  <  l. 

Figure  5.8(a)  shows  an  example  of  sequential  acceptance  sampling  with  out-of-order  observations.  In 
this  case,  observations  7  through  1 1  arrive  before  observation  6,  but  it  is  safe  to  accept  Hq  without  waiting 
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Figure  5.8:  Acceptance  sampling  with  out-of-order  observations.  The  solid  curve  in  each  of  the  plots  represents  dm 
and  the  dotted  curve  represents  drn .  Note  that  both  curves  cross  the  acceptance  line  for  Hi  in  (b),  but  that  the  curve 
for  dm  crosses  the  acceptance  line  for  lit,  at  an  earlier  stage. 


for  observation  6.  The  speedup  can  be  significant  if  observation  6  happens  to  take  an  exceptionally  long 
time  to  generate.  In  Figure  5.8(b),  we  have  an  example  of  a  situation  where  we  have  to  wait  for  observation 
6,  because  the  final  outcome  of  the  test  depends  on  it:  if  xq  =  1  we  will  accept  Ho,  but  if  xq  =  0  we  will 
accept  H\. 


5.4  Complexity  of  Statistical  Probabilistic  Model  Checking 

The  time  complexity  of  statistical  probabilistic  model  checking  depends  on  the  number  of  observations 
(sample  size)  required  to  reach  a  decision,  as  well  as  the  time  required  to  generate  each  observation.  An 
observation  involves  the  verification  of  a  path  formula  (p  over  a  sample  trajectory  at.  The  sample  size  for  a 
sequential  acceptance  sampling  test  is  a  random  variable,  and  so  is  the  time  per  observation,  which  means 
that  we  can  generally  only  talk  about  the  expected  complexity  of  statistical  probabilistic  model  checking. 

First,  consider  the  time  complexity  for  UTSL  formulae  without  nested  probabilistic  operators.  The  first 
component  of  the  complexity  is  the  time  per  observation.  A  sample  trajectory  a,  may  very  well  be  infinite, 
but  in  order  to  verify  the  path  formula  X1  <I\  we  only  need  to  consider  a  finite  prefix  of  <7;.  The  same  is 
true  for  path  formulae  of  the  form  <F  U1  T  if  the  conditions  of  Theorem  5.1  are  satisfied.  Without  nested 
probabilistic  operators,  nested  UTSL  formulae  will  be  classical  logic  expressions,  which  we  assume  can  be 
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verified  in  constant  time.  Let  m  be  the  expected  effort  to  simulate  a  state  transition.  The  time  per  observation 
is  proportional  to  m  for  X1  $  and  proportional  to  m  times  the  number  of  state  transitions  that  occur  in  a 
time  interval  of  length  sup  I  for  <I>  U1  T.  Let  q  denote  the  expected  number  of  state  transitions  that  occur  in 
a  unit  length  interval  of  time.  For  continuous-time  Markov  processes,  an  upper  bound  for  q  is  the  maximum 
exit  rate  of  any  state.  The  expected  time  per  observation  is  then  0(m  ■  q  ■  sup  I)  for  $  U1  T.  This  is  a 
worst-case  estimate,  because  it  assumes  that  ->$  V  T  is  not  satisfied  prior  to  time  sup  I.  If  we  reach  a  state 
satisfying  ->$  V  T  long  before  visiting  q  ■  sup  I  states,  then  we  can  determine  the  truth  value  of  T  U1  T 
without  considering  further  states. 

The  second  component  of  the  time  complexity  for  verifying  'Pl;<  g  [<p\  is  the  expected  sample  size,  which  is 
a  function  of  the  error  bounds  a  and  j3,  and  the  two  probability  thresholds  po  and  p\  (alternatively  expressed 
using  the  threshold  9  and  the  half-width  of  the  indifference  region  5).  If  we  use  a  sequential  test,  then 
the  expected  sample  size  also  depends  on  the  unknown  probability  measure  p  of  the  set  of  trajectories  that 
satisfy  ip.  The  expected  sample  size  for  various  acceptance  sampling  tests  was  discussed  in  Section  2.2.  For 
example,  we  showed  that  the  sample  size  for  a  single  sampling  plan  is  approximately  proportional  to  the 
logarithm  of  a  and  (5,  and  inversely  proportional  to  the  width  of  the  indifference  region. 

Let  Np  denote  the  expected  sample  size  of  the  acceptance  sampling  test  we  use  to  verify  probabilistic 
statements.  The  verification  time  for  V^g  [X1  <I>]  is  then  0(Np  ■  rn)  and  for  g  [<1>  U1  T']  it  is  0(Np  ■  m  ■ 
q  ■  sup  I).  Note  that  there  is  no  direct  dependence  on  the  size  of  the  state  space  of  the  model,  which  is  in 
sharp  contrast  to  numerical  solution  techniques  for  probabilistic  model  checking,  whose  time  complexity  is 
proportional  to  the  size  of  the  state  space  (Hansson  and  Jonsson  1994;  Baier  et  al.  2003). 

The  time  complexity  of  statistical  probabilistic  model  checking  is  independent  of  the  size  of  the  state 
space  for  a  model  if  Np,  rn,  and  q  arc  independent  of  state  space  size  as  well.  We  can  make  Np  completely 
model  independent  by  using  a  single  sampling  plan,  in  which  case  Np  depends  only  on  the  parameters  a, 
(3,  9,  and  5.  The  factor  rn  is  generally  both  model  and  implementation  dependent  and  therefore  hard  to 
capture.  For  generalized  semi-Markov  processes,  for  example,  m  could  very  well  be  proportional  to  the 
number  of  events  in  the  model.  It  can  also  be  state  space  dependent,  but  models  often  have  structure  that 
can  be  exploited  by  the  simulator  to  avoid  such  dependence.  Finally,  q  is  clearly  model  dependent,  but  may 
be  independent  of  the  size  of  the  state  space.  For  example,  this  is  the  case  for  the  symmetric  polling  system 
described  in  Section  6.1.2. 
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With  nested  probabilistic  operators,  the  verification  time  per  state  along  a  sample  trajectory  is  no  longer 
constant.  The  complexity  depends  on  the  level  of  nesting  and  the  path  operators  involved.  Here,  we  consider 
the  UTSL  formula  V^e\P^e'  [‘h7  U1'  T']  U1  'b]  with  one  level  of  nesting  as  an  example.  In  the  worst 
case  we  need  to  verify  'P:x}  o>  [fib  U1'  'L]  in  q  ■  sup  I  states  for  each  of  the  Np  observations  required  for  the 
verification  of  the  outer  probabilistic  operator.  The  worst-case  complexity  for  verifying  'P:x]  qi  [<b'  U1'  Vb'J , 
assuming  <b'  and  T'  do  not  contain  any  probabilistic  operators,  is  0{N'p  -m-  q-  sup  I'),  so  the  total  expected 
worst-case  complexity  is  0(Np  ■  Np  ■  m2  •  q2  •  sup  I  •  sup  I').  However,  if  we  use  memoization,  the  expected 
worst-case  complexity  is  0(rn  ■  q  ■  (Np  ■  sup  I  +  k  ■  Np-  sup  I') )  instead,  where  k  is  the  expected  number  of 
unique  states  visited  within  sup  I  +  sup  I'  time  units  from  some  initial  state.  The  value  of  k  is  in  the  worst 
case  |Sj,  the  size  of  the  state  space,  but  can  be  significantly  smaller  depending  on  the  dynamics  of  the  model 
and  the  time  bounds  sup  I  and  sup  I'. 

The  space  complexity  of  statistical  probabilistic  model  checking  is  generally  quite  modest.  We  need  to 
store  the  current  state  of  a  sample  trajectory  when  generating  an  observation  for  the  verification  of  a  prob¬ 
abilistic  UTSL  formula,  and  this  typically  requires  O ( log  ,Sj)  space,  where  \S\  is  the  number  of  states  for 
the  model.  For  stochastic  discrete  event  systems  that  do  not  satisfy  the  Markov  property,  we  may  also  need 
to  store  additional  information,  such  as  scheduled  trigger  times  for  enabled  events  in  the  case  of  generalized 
semi-Markov  processes.  In  the  presence  of  nesting,  we  may  need  to  store  up  to  d  states  simultaneously  at 
any  point  in  time  during  verification,  where  d  is  the  maximum  depth  of  a  nested  probabilistic  operator.  The 
nesting  depth  for  a  UTSL  formula  <b  is  at  most  |<b|,  so  the  space  requirements  are  still  modest.  If  we  use 
memoization  to  speed  up  the  verification  of  UTSL  formulae  with  nested  probabilistic  operators,  the  space 
complexity  can  be  as  high  as  0(|<b|  •  ,Sj ).  Memoization,  as  usual,  is  a  way  of  trading  space  efficiency  for 
time  efficiency. 

The  statistical  approach  works  for  infinite-state  systems  as  well,  so  long  as  we  need  to  visit  only  a  finite 
number  of  states  in  order  to  verify  a  UTSL  formula.  This  is  the  case  if  the  conditions  of  Theorem  5.1  are 
satisfied.  To  verify  V^olT  U1  \b],  the  expected  number  of  states  that  we  need  to  visit  is  0(NP  ■  q  ■  sup I). 
The  expected  number  of  unique  states  is  0(min(Arp  •  q  ■  sup/,  |Sj)),  which  becomes  the  expected  space 
complexity  for  memoization  with  one  level  of  nesting. 


Chapter  6 


Empirical  Evaluation  of 
Probabilistic  Model  Checking 


In  the  previous  chapter,  we  described  a  statistical  approach  to  probabilistic  model  checking,  and  concluded 
with  a  theoretical  discussion  regarding  the  computational  complexity  of  our  statistical  solution  method.  To 
get  a  better  feeling  for  how  well  our  solution  method  performs  in  practice,  we  evaluate  it  empirically  on  a 
set  of  case  studies  taken  from  the  literature  on  performance  evaluation  and  probabilistic  model  checking. 
We  also  compare  the  statistical  solution  method  with  the  leading  numerical  solution  method  for  transient 
analysis  of  Markov  processes.  The  purpose  of  this  empirical  study  is  to  show  how  the  performance  of  the 
different  solution  methods  depends  on  input  parameters  and  model  characteristics. 

Our  empirical  results  indicate  that  the  statistical  solution  method  scales  better  than  the  numerical  solution 
method  as  the  size  of  the  state  space  increases,  but  that  the  performance  of  the  two  methods  scales  similarly 
as  a  function  of  the  time  bounds  involved  in  the  UTSL  formulae.  We  also  show  that  the  sequential  probability 
ratio  test  generally  outperforms  the  sequential  modification  of  a  single  sampling  plan,  although  there  arc 
exceptions  to  this  rule,  as  was  noted  already  in  Section  2.2.3. 

The  empirical  evaluation  that  we  present  in  this  chapter  is  meant  as  an  aid  to  practitioners  who  want  to 
use  probabilistic  model  checking  to  verify  their  system  designs.  We  cannot  recommend  a  single  solution 
method  that  is  superior  in  all  cases,  as  the  right  choice  depends  on  characteristics  of  the  model  and  the 
requirements  on  the  accuracy  of  the  model  checking  result.  We  show  the  tradeoffs  between  accuracy  and 
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Figure  6.1:  Tandem  queuing  network  with  a  two-phase  Coxian  distribution  governing  the  routing  time  between  the 
queues. 

speed  that  exist,  and  the  results  we  present  can  help  a  user  make  an  informed  choice  regarding  solution 
method  and  input  parameters. 

The  empirical  results  presented  in  this  chapter  were  generated  on  a  3  GHz  Pentium  4  PC  running  Linux, 
and  with  an  800  MB  memory  limit  set  per  process,  unless  noted  otherwise.  The  memory  limit  per  process 
was  set  lower  than  the  physical  memory  limit  of  the  machine  (1  GB)  to  avoid  swapping. 

6.1  Case  Studies 

We  present  two  case  studies,  taken  from  the  literature  on  performance  evaluation  and  probabilistic  model 
checking,  and  selected  to  accentuate  specific  performance  characteristics  of  solution  methods  for  probabilis¬ 
tic  model  checking.  A  third  simple  example  is  also  introduced  to  illustrate  the  use  of  nested  probabilistic 
operators  in  UTSL. 

6.1.1  Tandem  Queuing  Network 

The  first  case  study  is  based  on  a  model  of  a  tandem  queuing  network  presented  by  Hermanns  et  al.  (1999). 
The  network  consists  of  two  serially  connected  queues,  each  with  capacity  n,  making  the  total  capacity  of  the 
system  2 n.  Figure  6.1  shows  a  schematic  view  of  the  model.  Messages  arrive  at  the  first  queue,  get  routed 
to  the  second  queue  after  having  been  in  the  first  queue  for  some  time,  and  eventually  leave  the  system  after 
being  processed  in  the  second  queue.  The  interarrival  time  for  messages  at  the  first  queue  is  exponentially 
distributed  with  rate  A  =  i  n.  The  processing  time  at  the  second  queue  is  exponentially  distributed  with  rate 
k  =  4.  The  routing  time  distribution  is  a  two-phase  Coxian  distribution  with  parameters  fi\  =  /rn  =  2  and 
a  =  0.9.  The  size  of  the  state  space  for  a  tandem  queuing  network  of  capacity  2 n  is  0(n2). 

We  will  verify  whether  the  probability  is  less  than  0.5  that  a  system  starting  out  with  both  queues  empty 
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becomes  full  within  r  time  units.  Let  s*  G  {0, . . . ,  n},  for  i  G  {1, 2},  be  the  number  of  messages  currently 
in  the  /th  queue.  The  tandem  queuing  network  is  full  if  the  formula  s\=n  A  S2=n  holds.  The  UTSL  formula 
'P <  0.5  [O[0’r]  si=n  A  S2=n]  represents  the  property  of  interest,  and  we  will  verify  this  formula  in  the  state 
si  =  0  A  S2  =  0. 

6.1.2  Symmetric  Polling  System 

The  second  case  study  uses  the  model  of  an  n-station  symmetric  polling  system  described  by  Ibe  and  Trivedi 
(1990).  Each  station  has  a  single-message  buffer  and  the  stations  are  attended  by  a  single  server  in  cyclic 
order.  The  server  begins  by  polling  station  1.  If  there  is  a  message  in  the  buffer  of  station  1,  the  server 
starts  serving  that  station.  Once  station  i  has  been  served,  or  if  there  is  no  message  in  the  buffer  of  station 
i  when  it  is  polled,  the  server  starts  polling  station  i  +  1  (or  1  if  i  =  n).  The  polling  and  service  times  arc 
exponentially  distributed  with  rates  7  =  200  and  fi  =  1,  respectively.  Messages  arrive  to  the  system,  as  a 
whole,  according  to  a  Poisson  process,  and  the  inter-arrival  time  is  exponentially  distributed  with  rate  1.  At 
arrival,  messages  arc  assigned,  with  equal  probability,  to  one  of  the  n  stations.  If  a  message  is  assigned  to 
a  station  whose  buffer  is  full,  then  the  message  is  dropped.  Another  way  to  think  of  this  is  that  there  is  a 
separate  arrival  event  for  each  station,  with  the  inter-arrival  time  per  station  being  exponentially  distributed 
with  rate  A  =  I  in.  The  fact  that  arrival  rates  arc  equal  for  all  stations  makes  the  system  symmetric.  The 
size  of  the  state  space  for  a  system  with  n  stations  is  0(n- 2n). 

We  will  verify  the  property  that,  if  station  1  is  full,  then  it  is  polled  within  r  time  units  with  probability 
at  least  9.  We  do  so  for  different  values  of  n,  r,  and  9  in  the  state  where  station  1  has  just  been  polled  and  the 
buffers  of  all  stations  arc  full.  Let  s  G  {1, . . . ,  n}  be  the  station  currently  receiving  the  server’s  attention, 
let  a  G  {0, 1}  represent  the  activity  of  the  server  (0  for  polling  and  1  for  serving),  and  let  m,  G  {0, 1} 
be  the  number  of  messages  in  the  buffer  of  station  i.  The  property  of  interest  is  represented  in  UTSL  as 
m\=l  — ■>  V>e  [<>[0’rl  polli],  where  polli  =  s=l  A  a=0,  and  the  state  in  which  we  verify  the  formula  is 
given  by  s=l  A  a= 1  A  mi=l  A  •  •  •  A  mn=l. 


6.1.3  Robot  Grid  World 


The  third  case  study  involves  a  robot  navigating  in  a  grid  world,  and  was  introduced  by  Younes  et  al.  (2004) 
to  illustrate  the  verification  of  formulae  with  nested  probabilistic  operators.  We  have  an  n  x  n  grid  world 
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Figure  6.2:  A  grid  world  with  a  robot  (R)  in  the  bottom  left  corner  and  a  janitor  (J)  in  the  center.  The  dashed  arrow 
indicates  the  path  of  the  robot.  The  janitor  moves  with  equal  probability  to  any  of  the  adjacent  squares. 

with  a  robot  moving  from  the  bottom  left  comer  to  the  top  right  comer.  The  robot  first  moves  along  the 
bottom  edge  and  then  along  the  right  edge.  In  addition  to  the  robot,  there  is  a  janitor  moving  randomly 
around  the  grid.  Figure  6.2  provides  a  schematic  view  of  a  grid  world  with  n  =  5. 

The  robot  moves  at  rate  \r  =  1,  unless  the  janitor  occupies  the  destination  square,  in  which  case  the 
robot  remains  stationary  The  janitor  moves  around  randomly  in  the  grid  world  at  rate  Aj  =  2,  selecting 
the  destination  from  the  set  of  neighboring  squares  according  to  a  discrete  uniform  distribution.  The  robot 
initiates  communication  with  a  base  station  at  rate  p  =  1/10,  and  the  duration  of  each  communication 
session  is  exponentially  distributed  with  rate  k  =  1/2. 

The  objective  is  for  the  robot  to  reach  the  goal  square  at  the  top  right  corner  within  n  time  units  with 
probability  at  least  0.9,  while  maintaining  at  least  a  0.5  probability  of  periodically  communicating  with  the 
base  station.  Let  c  be  a  Boolean  state  variable  that  is  true  when  the  robot  is  communicating  with  the  base 
station,  and  let  x  and  y  be  two  integer  valued  state  variables  holding  the  current  location  of  the  robot.  The 
UTSL  formula  P>  o.9  [P>  0.5  [^[0’T21  c]  Z#’T11  x=n  A  y=n]  expresses  the  desired  objective.  The  robot 
moves  along  a  line  only,  so  the  size  of  the  state  space  for  the  robot  grid  world  is  0(n3). 

6.2  Evaluation  of  Statistical  Solution  Method 

As  discussed  in  Section  5.4,  there  are  two  main  factors  influencing  the  verification  time  for  the  statistical 
approach:  the  sample  size  required  to  achieve  prescribed  accuracy  and  the  length  of  trajectory  prefixes  (in 
terms  of  state  transitions)  required  to  determine  if  a  path  formula  holds. 

The  sample  size  depends  on  the  sampling  plan  that  we  choose  to  use,  the  error  bounds  a  and  f3  that 
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we  want  to  guarantee,  the  threshold  6,  as  well  as  our  choice  of  5(6)  determining  the  half-width  of  an 
indifference  region  centered  around  6.  For  sequential  sampling  plans,  the  sample  size  is  a  random  variable 
whose  expectation  also  varies  with  p,  which  in  our  case  is  the  probability  measure  of  a  set  of  trajectories 
satisfying  a  path  formula.  The  approximation  formulae  for  the  expected  sample  size  of  various  sampling 
plans  provided  in  Section  2.2  give  us  some  idea  of  what  to  expect,  and  the  empirical  results  presented  in  this 
section  show  the  actual  performance  on  the  various  case  studies. 

The  expected  length  of  trajectories  varies  with  the  model  and  the  path  formula,  as  we  will  see.  If  we 
arc  lucky,  we  can  verify  a  time -bounded  path  formula  over  a  sample  trajectory  by  considering  only  a  short 
prefix  that  ends  long  before  the  time  bound  is  exceeded.  For  some  models,  however,  the  number  of  state 
transitions  that  occur  in  a  given  time  interval  may  be  large,  even  if  the  interval  is  short,  and  this  will  lead  to 
longer  verification  times. 

6.2.1  Comparing  Sampling  Plans 

We  consider  two  different  sampling  plans  introduced  in  Section  2.2:  the  sequential  version  of  a  single 
sampling  plan  (Algorithm  2.2)  and  Wald’s  sequential  probability  ratio  test  (SPRT;  Algorithm  2.3).  We  do 
not  include  experiments  with  a  non-sequential  single  sampling  plan.  There  is  of  course  a  slight  overhead 
introduced  by  using  a  sequential  stopping  rule  with  a  single  sampling  plan,  but  this  overhead  is  negligible 
(essentially  three  additional  integer  operations  per  iteration).  The  reduction  in  expected  sample  size  that  we 
get  from  using  a  sequential  stopping  rule  dominates  the  small  overhead  required  to  test  for  early  termination. 

Figures  6.3  and  6.4  present  data  for  the  tandem  queuing  network  and  symmetric  polling  system  case 
studies,  respectively.  In  each  case,  we  show  verification  time  for  the  simple  sequential  sampling  plan  and 
the  SPRT  using  four  different  test  strengths  (subfigures  (a)  and  (b)).  We  also  give  details  of  both  sample  size 
(subfigures  (c)  and  (d))  and  trajectory  length  (subfigures  (e)  and  (f)).  For  all  data,  we  plot  the  results  both 
against  model  size  (subfigures  (a),  (c),  and  (e))  and  against  the  time  bound  of  the  path  formula  (subfigures 
(b),  (d),  and  (f)).  Each  data  point  is  an  average  over  20  runs.  We  used  5(6)  =  5  •  10-3  as  the  half-width  of 
the  indifference  region.  Furthermore,  we  used  a  symmetric  test  strength  (a  =  (3)  across  the  board. 

Our  data  shows  that  the  SPRT  outperforms  the  simple  sequential  test  almost  exclusively  by  a  wide 
margin.  We  can  typically  solve  the  same  model  checking  problem  with  the  SPRT  using  test  strength  10-8 
in  shorter  time  than  it  takes  to  solve  the  same  problem  with  a  simple  sequential  test  using  test  strength  HR  1 . 
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(a)  Verification  time  as  a  function  of  state  space  size.  (b)  Verification  time  as  a  function  of  time  bound. 
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(c)  Sample  size  as  a  function  of  state  space  size.  (d)  Sample  size  as  a  function  of  time  bound. 
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(e)  Trajectory  length  as  a  function  of  state  space  size.  (f)  Trajectory  length  as  a  function  of  time  bound. 

Figure  6.3:  Empirical  results  for  the  tandem  queuing  network  (6  =  0.5),  with  r  =  50  (left)  and  n  =  63  (right),  using 
acceptance  sampling  with  26  =  10~2  and  symmetric  error  bounds  a  =  (3  equal  to  10-8  (a),  10-4  (□),  10~2  (v),  and 
10”1  (o).  The  average  trajectory  length  is  the  same  for  all  values  of  a  and  / 3 .  The  dotted  lines  mark  a  change  in  the 
truth  value  of  the  formula  being  verified. 
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(a)  Verification  time  as  a  function  of  state  space  size. 


(c)  Sample  size  as  a  function  of  state  space  size. 


(e)  Trajectory  length  as  a  function  of  state  space  size. 


(b)  Verification  time  as  a  function  of  time  bound. 


(d)  Sample  size  as  a  function  of  time  bound. 


(f)  Trajectory  length  as  a  function  of  time  bound. 


Figure  6.4:  Empirical  results  for  the  symmetric  polling  system  ( 9  =  0.5),  with  r  =  20  (left)  and  n  =  10  (right),  using 
acceptance  sampling  with  26  =  10-2  and  symmetric  error  bounds  a  =  (3  equal  to  10-8  (a),  10-4  (□),  10~2  (v),  and 
10“ 1  (o).  The  average  trajectory  length  is  the  same  for  all  values  of  a  and  (3.  The  dotted  lines  mark  a  change  in  the 
truth  value  of  the  formula  being  verified. 
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The  difference  in  performance  is  due  entirely  to  a  difference  in  sample  size,  as  the  average  trajectory  length 
is  the  same  for  both  tests  regardless  of  strength.  The  average  sample  size  for  the  SPRT  roughly  doubles 
when  the  test  strength  goes  from  10~x  to  1 0  2x ,  while  the  average  sample  size  for  the  simple  sequential  test 
more  than  doubles  (and  often  more  than  triples)  for  the  same  change  in  test  strength. 

The  vertical  dotted  lines  in  the  figures  indicate  a  change  in  the  truth  value  of  the  UTSL  formula  that  is 
being  verified.  This  line  marks  the  value  of  ,S'  or  r  where  the  probability  measure  for  the  set  of  trajectories 
satisfying  the  path  formula  is  exactly  equal  to  the  probability  threshold  9.  We  can  see  that  the  average 
sample  size  for  both  tests  peaks  in  the  vicinity  of  the  dotted  line,  with  the  peaks  for  the  SPRT  being  more 
pronounced  than  those  for  the  simple  sequential  test. 

The  average  trajectory  length  for  the  tandem  queuing  network  increases  linearly  with  the  capacity  n  of 
the  queues.  This  is  because  the  arrival  rate  for  messages  is  4n,  so  the  average  number  of  state  transitions  that 
occur  in  a  fixed  interval  of  time  increases  with  n.  Note,  however,  that  the  size  of  the  state  space  is  0(n2) 
for  the  tandem  queuing  network,  so  the  the  average  trajectory  length  is  proportional  to  the  square  root  of  ,3’ 
(Figure  6.3(e)).  Thus,  the  average  trajectory  length,  and  therefore  also  the  overall  time  complexity  for  the 
statistical  solution  method,  is  subl incar  in  the  size  of  the  state  space.  In  contrast,  the  rates  for  the  symmetric 
polling  system  are  independent  of  the  size  of  the  state  space.  Initially,  the  average  trajectory  length  increases 
with  the  size  of  the  state  space  (Figure  6.4(e))  because  it  takes  longer  time  to  achieve  polli  with  more  polling 
stations.  As  the  state  space  increases  further,  the  probability  of  achieving  pol in  the  interval  [0,r]  goes 
to  zero,  and  all  sample  trajectories  end  with  the  time  bound  r  being  exceeded.  The  expected  number  of 
state  transitions  occurring  in  the  interval  [0,  r]  is  the  same  for  all  state  space  sizes,  since  the  exit  rates  are 
constant,  so  the  verification  time  does  not  increase  for  larger  state  spaces. 

As  a  function  of  the  time  bound  r  (Figure  6.3(f)),  for  a  fixed  n,  the  average  trajectory  length  grows 
linearly  with  r  for  the  tandem  queuing  network,  at  least  for  sufficiently  large  values  of  r.  The  same  is 
true  for  the  symmetric  polling  system  (Figure  6.4(f))  for  small  values  of  r,  but  as  t  increases  so  does  the 
probability  of  achieving  poll  {  in  the  interval  [0,  r]  (Figure  6.5),  and  the  average  trajectory  length  approaches 
a  constant  value  as  r  increases.  This  shows  how  the  performance  of  the  statistical  solution  method  depends 
on  the  formula  that  is  verified  in  a  more  complex  way  than  simply  through  the  time  bounds  of  path  formulae. 

While  the  SPRT  typically  has  a  smaller  expected  sample  size  than  the  simple  sequential  test  for  the  same 
test  strength,  a  clear  exception  is  seen  in  Figure  6.4(d).  We  witness  the  same  phenomenon  in  Figure  6.6  for 
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Figure  6.5:  Probability  p  of  the  set  of  trajectories  sat¬ 
isfying  the  path  formula  o[°,rl  pol /1  for  the  symmetric 
polling  system. 


Figure  6.6:  Sample  size  as  a  function  of  the  formula  time 
bound  for  the  symmetric  polling  system  (9  =  0.9  and 
n  =  10),  using  acceptance  sampling  with  26  =  10-2  and 
symmetric  error  bounds  a  =  /3  equal  to  1CP8  (a),  10~4 
(□),  10"2  (v),  and  KT1  (o). 


a  different  6  (0.9  instead  of  0.5),  which  also  shows  the  variation  in  performance  based  on  the  threshold.  For 
6  =  0.5,  the  sample  size  is  the  same  on  both  sides  of  the  vertical  dotted  line,  but  it  is  notably  lower  to  the 
left  of  the  line  for  9  =  0.9.  There  is  a  sharp  peak  in  the  expected  sample  size  for  the  SPRT  close  to  where 
the  truth  value  of  the  UTSL  formula  changes,  as  indicated  by  the  dotted  line.  For  a  =  (3  equal  to  10-4 
and  10-8,  the  SPRT  has  a  larger  expected  sample  size  than  the  simple  sequential  test.  We  can  see  this  more 
clearly  in  Figure  6.7,  where  we  have  zoomed  in  on  the  relevant  region.  The  gray  area  indicates  the  range  of 
r  for  which  the  probability  measure,  p,  of  the  set  of  trajectories  satisfying  the  path  formula  O  ()'r  poll ,  is  in 
the  indifference  region  (0  —  d.  0  +  d).  We  can  see  that  there  is  a  sharp  increase  in  the  expected  sample  size 
for  the  SPRT  in  and  near  the  indifference  region,  while  the  expected  sample  size  for  the  simple  sequential 
test  remains  largely  unchanged.  Still,  it  is  only  for  a  very  narrow  range  of  r  that  the  simple  sequential  test 
outperforms  the  SPRT  on  average,  for  this  particular  choice  of  6  (5  •  10-3).  We  would  not  expect  that  p  is 
this  close  to  9  for  typical  model  checking  problems.  Furthermore,  neither  of  the  two  tests  give  any  valuable 
accuracy  guarantees  in  the  indifference  region.  If  we  do  expect  p  to  be  very  close  to  6,  and  we  want  to  know 
on  which  side  of  the  threshold  p  really  is,  then  we  may  have  to  resort  to  numerical  solution  techniques. 

We  can  increase  the  accuracy  of  the  model  checking  result  by  strengthening  the  test  (decreasing  a  and  (3) 
or  narrowing  the  indifference  region.  Figure  6.8  shows  how  the  expected  sample  size  for  the  two  sampling 
plans  depends  on  the  half-width  of  the  indifference  region.  The  plots  arc  for  the  symmetric  polling  system 
with  n  =  10  and  two  different  values  of  9  and  r.  We  can  see  that  it  is  generally  more  costly  to  narrow  the 
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(a)  e  =  0.5 


(b)  e  =  0.9 


Figure  6.7:  Sample  size  as  a  function  of  the  formula  time  bound  for  the  symmetric  polling  system  ( n  =  10)  in 
the  vicinity  of  the  indifference  region  for  two  different  values  of  6,  using  acceptance  sampling  with  25  =  10-2  and 
symmetric  error  bounds  a  =  (3  equal  to  10~8  (a),  10“4  (□),  10~2  (v),  and  10_1  (o).  The  indifference  region  is 
indicated  by  a  shaded  area. 


indifference  region  when  using  the  simple  sequential  test  rather  than  the  SPRT.  For  example,  we  can  have  an 
indifference  region  of  half-width  10“  5  with  the  SPRT  at  essentially  the  same  cost  as  10~3  with  the  simple 
sequential  test.  For  5  =  HP 1  in  the  right  plot,  the  upper  border  of  the  indifference  region  is  1,  which  means 
that  both  the  SPRT  and  the  simple  sequential  test  become  a  curtailed  single  sampling  plan.  This  explains 
the  drop  in  expected  sample  size  at  this  point. 


6.2.2  “Five  Nines” 

For  safety  critical  systems,  we  want  to  ensure  that  the  probability  of  failure  is  very  close  to  zero.  While 
guaranteeing  a  zero  probability  of  failure  is  usually  unrealistic,  it  is  not  uncommon  to  require  the  failure 
probability  of  a  safety  critical  system  to  be  at  most  10-4  or  10-5.  A  failure  probability  of  at  most  10-5 
means  a  success  probability  of  1  —  10-5  =  0.99999,  commonly  referred  to  as  “five  nines.”  For  such  high 
accuracy  requirements,  it  is  typically  best  to  use  numerical  solution  techniques,  but  if  the  model  is  non- 
Markovian  or  has  a  large  state  space,  this  may  not  be  a  viable  choice. 

To  use  statistical  hypothesis  testing  with  a  probability  threshold  1  —  10-5,  we  need  to  use  an  indifference 
region  with  half-width  at  most  10-5.  An  indifference  region  that  narrow  requires  a  large  average  sample  size 
if  the  success  probability  is  close  to  one,  as  we  would  expect  it  to  be  for  a  good  system  design.  A  possible 
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Figure  6.8:  Sample  size  as  a  function  of  the  half-width  of  the  indifference  region  for  the  symmetric  polling  system, 
using  acceptance  sampling  with  symmetric  error  bounds  a  =  (3  equal  to  10“8  (a),  10~4  (□),  10~2  (v),  and  lO-1  (o). 

solution  is  to  set  the  indifference  region  to  ( 1  — 10-5 , 1 )  and  use  a  curtailed  single  sampling  plan.  We  need  up 
to  n  =  [log  (3/  log(l  —  10-5)]  observations  for  such  a  sampling  plan,  where  ft  is  the  maximum  probability 
that  we  accept  the  system  as  safe  if  the  success  probability  is  at  most  1  —  10-5.  We  accept  the  system 
as  safe  if  all  n  observations  are  positive,  but  reject  the  system  as  unsafe  at  the  first  negative  observation. 
This  means  that  if  the  success  probability  for  the  system  is  far  below  acceptable,  we  will  quickly  reject 
the  system,  but  acceptance  always  requires  n  observations.  Note,  however,  that  we  will  never  need  more 
than  n  observations,  so  the  maximum  effort  for  verifying  the  system  is  known.  Figure  6.9  plots  the  average 
verification  time,  as  a  function  of  the  formula  time  bound,  for  the  symmetric  polling  system  (n  =  10)  with 
indifference  regions  (0.99999, 1)  and  (0.999985, 0.999995),  of  which  the  former  leads  us  to  use  a  curtailed 
single  sampling  plan.  In  the  latter  case  (solid  curves),  the  SPRT  was  used. 

First,  consider  the  indifference  region  with  1  as  upper  bound,  which  leads  to  a  curtailed  single  sampling 
plan.  We  can  see  that  for  low  values  of  r,  the  average  verification  time  is  negligible,  simply  because  we  get 
a  negative  observation  very  quickly  and  reject  the  system  design  as  unacceptable.  As  r  increases  and  the 
success  probability  approaches  1  —  10~”5,  the  average  sample  size  increases.  As  we  pass  the  point  at  which 
the  success  probability  exceeds  1  —  10-5  (roughly  at  r  =  29.57),  the  sample  size  settles  at  around  2  •  106 
for  ft  =  10-8.  The  verification  time  at  this  point  is  just  under  11  minutes  on  our  test  machine  (the  average 
trajectory  length  is  just  over  23). 
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Figure  6.9:  Verification  time  as  a  function  of  the  formula 
time  bound  for  the  symmetric  polling  system  (n  =  10), 
using  acceptance  sampling  with  26  =  10“5  and  symmet¬ 
ric  error  bounds  a  =  ft  equal  to  10“8  (a)  and  10“ 1  (o). 


p 

>  .5 

>  .9 

>  .99 

>  .999 

10“8 

34.1 

36.6 

39.7 

42.6 

10“4 

33.2 

35.8 

38.9 

41.8 

10“2 

32.3 

34.8 

37.9 

40.9 

10"1 

31.3 

33.9 

37.0 

40.0 

Table  6.1:  Minimum  value  of  formula  time  bound  r,  for 
the  symmetric  polling  system  (9  =  0.999995  and  n  = 
10),  that  leads  to  an  acceptance  probability  of  at  least  .5, 
.9,  .99,  and  .999,  respectively,  for  26  =  10“5  and  four 
different  values  of  (3. 


We  control  the  probability  of  error  with  the  parameters  a  and  (3.  By  setting  ft  low,  we  guarantee  a 
low  probability  of  accepting  a  poor  system  design,  and  by  setting  a  low,  we  guarantee  a  low  probability 
of  rejecting  a  good  system  design.  A  curtailed  single  sampling  plan  is  an  efficient  way  of  dealing  with 
probability  thresholds  close  to  1,  but  it  gives  us  no  control  over  the  risk  of  rejecting  a  good  system  design, 
except  that  we  will  never  reject  a  system  design  with  success  probability  1.  This  may  lead  us  to  reject  many 
system  designs  that  in  practice  are  acceptable,  or  we  may  have  to  relax  the  system  requirements.  Table  6.1 
shows  the  value  of  r  for  the  symmetric  polling  system  that  leads  to  acceptance  with  a  certain  probability 
for  different  values  of  ft.  For  example,  to  guarantee  that  a  poor  system  design  is  accepted  with  probability 
at  most  10“8,  r  needs  to  be  at  least  42.6  for  acceptance  of  the  symmetric  polling  system  with  probability 
at  least  0.999.  In  reality,  the  probability  that  pol becomes  true  within  r  time  units  is  sufficiently  high  for 
t  =  29.57,  but  using  that  time  bound  for  verification  would  almost  definitely  lead  us  to  reject  the  system. 

To  ensure  a  non-trivial  bound  on  the  risk  of  rejecting  an  acceptable  system  design,  we  need  to  move 
the  upper  bound  of  the  indifference  region  away  from  1.  Finding  a  single  sampling  plan  for  an  indifference 
region  as  narrow  as  10”5  is  generally  not  feasible  (cf.  Figure  6.8),  so  we  use  only  the  SPRT  in  this  case. 
This  means  that,  in  contrast  to  a  curtailed  single  sampling  plan,  there  is  no  upper  bound  on  the  sample  size. 
The  solid  curves  in  Figure  6.9  show  the  average  verification  time  for  the  SPRT  with  indifference  region 
(0.999985,0.999995).  We  can  see  clear  peaks  in  the  verification  time  where  the  probability  is  close  to 
1  —  10“5.  The  price  for  moving  the  upper  bound  of  the  indifference  region  away  from  1  is  that  verification 
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can  take  over  an  hour  on  average  instead  of  a  few  minutes.  One  of  the  20  experiments  for  a  =  (3  =  I  (0  8 
required  a  sample  size  of  over  35  million,  which  can  be  compared  to  a  maximum  sample  size  of  just  over 
1.8  million  for  the  curtailed  single  sampling  plan  with  6  =  10-8. 

6.2.3  Nested  Probabilistic  Operators 

We  use  the  robot  grid  world  case  study  to  show  results  of  verification  with  nested  probabilistic  operators.  We 
have  proven  that  a  statistical  approach  is  possible  even  in  the  presence  of  nested  probabilistic  operators,  with 
Theorem  5.8  being  the  key  theoretical  result.  A  practical  concern,  however,  is  that  such  verification  could 
be  costly,  since  each  observation  for  the  outer  probabilistic  operator  involves  an  acceptance  sampling  test 
for  the  inner  probabilistic  operators.  Nevertheless,  our  empirical  results  suggest  that  a  statistical  approach 
is,  in  fact,  tractable. 

Figure  6. 10  shows  empirical  data  for  the  robot  grid  world  case  study  for  verifying  the  UTSL  formula 
V> 0.9 \P> 0.5  [o[°,T2l  c]  Zi[°,Tll  x=n  A  y=n\.  This  formula  asserts  that  the  probability  is  high  (at  least  0.9) 
that  the  robot  reaches  the  goal  position  while  periodically  communicating  with  the  base  station.  The  time 
bounds  r i  and  t-2  were  set  to  100  and  9,  respectively.  We  used  the  SPRT  exclusively,  with  memoization 
enabled,  and  the  heuristic  proposed  in  Definition  5.3  to  select  the  nested  error  bounds.  It  turns  out  that  with 
t-2  =  9,  the  probability  measure  of  the  set  of  paths  satisfying  O^0’7"2]  c  is  1  —  e  0,9  ~  0.593,  independent 
of  the  start  state.  We  used  an  indifference  region  with  half-width  5  independent  of  0.  For  both  values  of 
5  that  we  used,  S  =  0.05  and  6  =  0.025,  0.593  is  more  than  a  ^-distance  from  the  threshold  0.5  for  the 
inner  probabilistic  operator,  so  we  will  have  a  low  probability  of  erroneously  verifying  the  path  formula 
(P>0.5[O[0’T21  c]  x=n  A  y=n)  over  sample  trajectories.  For  the  outer  probabilistic  operator,  we 

used  the  symmetric  error  bounds  a  =  (3  =  1 0  - 2 .  The  heuristic  gave  us  the  symmetric  nested  error  bounds 
0.0153  and  0.00762  for  5  =  0.05  and  S  =  0.025,  respectively. 

We  can  see  in  Figure  6.10(b)  the  familial-  peak  in  the  average  sample  where  the  value  of  the  UTSL 
formula  goes  from  true  to  false.  Note,  however,  that  the  peak  is  not  present  in  Figure  6. 10(a),  where  the 
verification  time  is  plotted  as  a  function  of  the  state  space  size.  This  is  due  to  memoization.  Figure  6. 10(d) 
shows  the  fraction  of  unique  states  among  all  states  visited  along  sample  trajectories  for  the  outer  proba¬ 
bilistic  operator.  This  graph  is  almost  the  mirror  image  of  that  for  the  average  sample  size.  As  we  generate 
more  sample  trajectories,  the  probability  increases  that  we  visit  states  that  have  been  visited  before.  With 
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(a)  Verification  time  as  a  function  of  state  space  size. 


(c)  Trajectory  length  as  a  function  of  state  space  size. 


(b)  Sample  size  as  a  function  of  state  space  size. 


(d)  Fraction  of  unique  states  among  visited  states. 


Figure  6.10:  Empirical  results  for  the  robot  grid  world  (n  =  100  and  r2  =  9),  using  acceptance  sampling  with 
symmetric  error  bounds  a  =  (3  =  10-2.  The  average  trajectory  length  is  the  same  for  all  values  of  5.  The  dotted  lines 
mark  a  change  in  the  truth  value  of  the  formula  being  verified. 


Figure  6.11:  Verification  time  as  a  function  of  the  nested  error.  The  dotted  line  marks  the  maximum  nested  error. 
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Figure  6.12:  Fraction  of  verification  time  as  a  function 
of  state  space  size  for  the  symmetric  polling  system  (t  = 
20)  when  using  distributed  acceptance  sampling  with  two 
machines  instead  of  one. 


Figure  6.13:  Distribution  of  workload  as  a  function  of 
state  space  size  for  the  symmetric  polling  system  (r  = 
20)  when  using  distributed  acceptance  sampling  with 
in  =  2. 


memoization,  we  do  not  need  to  verify  nested  probabilistic  statements  more  than  once  in  a  visited  state,  so 
the  cost  per  observation  drops  over  time.  The  net  effect  is  that  total  verification  time  is  notably  reduced. 
The  price  we  pay  for  the  improved  efficiency  is  that  we  use  more  memory.  However,  the  number  of  unique 
visited  states  is  still  only  a  tiny  fraction  of  the  total  number  of  states  for  the  robot  grid  world,  resulting  in 
modest  memory  requirements. 

Figure  6. 1 1  shows  the  effectiveness  of  our  heuristic  for  selecting  the  nested  error.  We  plot  the  verification 
time  as  a  function  of  the  symmetric  nested  error  for  5  =  0.05  and  three  different  values  of  n  (the  size  of 
the  grid).  The  cross  on  each  curve  marks  the  performance  we  get  by  using  our  heuristic.  We  do  not  obtain 
optimal  performance,  but  we  are  only  off  by  a  factor  of  1.3  to  1.4.  Note  that  selecting  a  nested  error  that  is 
too  high  or  too  low  could  easily  result  in  a  performance  worse  than  optimal  by  orders  of  magnitude,  so  our 
heuristic  does  reasonably  well. 


6.2.4  Distributed  Acceptance  Sampling 

Acceptance  sampling  may  require  millions  of  observations,  but  each  observation  represents  an  independent 
chance  experiment.  This  means  that  we  can  carry  out  multiple  experiments  in  parallel,  which  could  result 
in  a  substantial  reduction  in  verification  time.  When  using  a  sequential  sampling  plan,  we  need  to  be  careful 
not  to  introduce  bias  against  observations  that  take  a  long  time  to  generate.  It  is  necessary  to  decide  a  priori 
on  an  order  in  which  observations  from  nodes  working  in  parallel  will  be  taken  into  consideration,  and  not 
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simply  incorporate  observations  as  they  arc  generated.  We  addressed  this  problem  in  Section  5.3.1,  where 
we  proposed  a  way  to  schedule  the  processing  of  observations  that  dynamically  adjusted  to  a  heterogeneous 
environment  (e.g.  observations  being  generated  on  CPUs  of  different  speed). 

Figure  6.12  shows  the  reduction  in  verification  time  as  a  function  of  the  state  space  size  for  the  symmetric 
polling  system  when  using  two  machines  to  generate  observations.  The  first  machine  is  equipped  with  a 
Pentium  III  733  MHz  processor.  If  we  also  generate  observations,  in  parallel,  on  a  machine  with  a  Pentium 
III  500  MHz  processor,  we  get  the  relative  performance  indicated  by  the  solid  curve.  The  verification  time 
with  two  machines  is  roughly  70  percent  of  the  verification  time  with  a  single  machine.  Figure  6.13  shows 
the  fraction  of  observations  used  from  each  machine,  with  mi  being  the  machine  with  a  733  MHz  processor 
and  m2  being  the  machine  with  a  500  MHz  processor.  We  can  see  that  these  fractions  arc  in  line  with 
the  relative  performance  of  the  two  machines,  and  this  is  achieved  without  any  explicit  communication  of 
performance  characteristics. 1 


6.3  Comparison  with  Numerical  Solution  Method 

To  verify  the  UTSL  formula  o  ['b  Ullr  T]  in  some  state  s  €  S  of  a  model  M.  with  state  space  S,  we 
can  compute  the  probability  p  =  p({&  6  Path({(s ,  0)})  |  A4 ,  cr,  0  |=  U{]/T  T})  numerically  and  test  if 

p  txi  9  holds. 

First,  as  initially  proposed  by  Baier  et  al.  (2000),  the  problem  is  reduced  to  the  computation  of  transient 
probabilities  on  a  modified  model  A4',  where  all  states  in  A4  satisfying  -><J>  V  T  have  been  made  absorbing. 
The  probability  p  is  equal  to  the  probability  that  we  arc  in  a  state  satisfying  T  at  time  r  in  model  M! . 
This  probability  can  be  computed  using  a  technique  called  uniformization  (also  know  as  randomization ), 
originally  proposed  by  Jensen  (1953).  The  computation  of  p  is  expressed  as  an  infinite  sum,  with  each  term 
involving  a  matrix-vector  multiplication.  In  practice,  the  infinite  summation  is  truncated  by  using  the  tech¬ 
niques  of  Fox  and  Glynn  (1988),  so  that  the  truncation  error  is  bounded  by  an  a  priori  error  tolerance  e.  The 
number  of  iterations  required  to  achieve  truncation  error  e  is  Re.  The  value  of  Re  is  q  ■  r  4-  k\J2q  ■  r  +  3/2, 

'Roughly  65  percent  of  the  observations  are  generated  by  mi.  The  CPU  speed  of  mi  (733  MHz)  is  just  over  59  percent  of  the 
combined  CPU  speed  for  both  mi  and  m2  (1233  MHz),  but  this  does  not  account  for  other  factors  (e.g.  cache  size)  that  also  impact 
performance. 
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where  q  is  the  maximum  exit  rate  for  the  model  and  k  is  o(y/ log(l/e))  (Fox  and  Glynn  1988).  This  means 
that  the  number  of  iterations  grows  very  slowly  as  e  decreases.  For  large  values  of  q  ■  t.  the  number  of  itera¬ 
tions  is  essentially  0(q  ■  r).  Each  iteration  involves  a  matrix- vector  multiplication  and  each  such  operation 
takes  O(M)  time,  where  M  is  the  number  of  non-zero  entries  in  the  rate  matrix  Q  for  the  continuous-time 
Markov  process  A4.  The  time  complexity  for  the  numerical  solution  technique  is  therefore  ()(q  ■  r  •  M )  (cf. 
Malhotra  et  al.  1994).  This  is  in  comparison  to  the  theoretical  time  complexity  ()(q-T- Np)  for  our  statistical 
solution  method,  where  Np  is  the  expected  sample  size  as  a  function  of  p.  In  the  worst  case  M  is  0(|5|2), 
but  is  typically  0(|S'|).  Np,  on  the  other  hand,  is  often  much  smaller  than  ,S’  for  large  state  spaces. 

The  number  of  iterations  required  by  the  numerical  solution  method  can,  in  some  cases,  be  reduced 
significantly  through  the  use  of  steady-state  detection  (Reibman  and  Trivedi  1988;  Malhotra  et  al.  1994; 
Younes  et  al.  2004).  Further  reduction  is  possible  by  using  the  sequential  stopping  rule  described  by  Younes 
et  al.  (2004),  although  this  does  not  reduce  the  asymptotic  time  complexity  of  the  numerical  solution  method. 

The  limiting  factor  for  the  numerical  solution  method  is  typically  memory.  The  space  complexity  for 
verifying  the  formula  V^o  [<J>  U^'T^  T]  is  0(151)  in  most  cases.  For  the  results  presented  in  this  section,  we 
use  the  hybrid  approach  proposed  by  Parker  (2002),  which  uses  flat  representations  of  vectors  and  symbolic 
data  structures,  such  as  BDDs  (Bryant  1986)  and  MTBDDs  (Clarke  et  al.  1993;  Bahai-  et  al.  1993;  Fujita 
et  al.  1997),  to  represent  matrices.  With  steady-state  detection  enabled,  the  hybrid  approach  requires  storage 
of  three  double  precision  floating  point  vectors  of  size  ,S'|,  which  for  a  memory  limit  of  800  MB  means 
that  systems  with  at  most  35  million  states  can  be  analyzed.  An  alternative  to  symbolic  data  structures 
is  sparse  matrices.  The  space  complexity  is  the  same  for  both  representations,  and  sparse  matrices  nearly 
always  provide  faster  numerical  computation,  but  symbolic  representations  of  rate  and  probability  matrices 
can  exploit  structure  in  the  model  and  therefore  require  less  memory  in  practice  (Kwiatkowska  et  al.  2004). 

Figure  6. 14  compares  the  performance  of  the  numerical  and  the  statistical  solution  methods  for  the  tan¬ 
dem  queuing  network  and  symmetric  polling  system  case  studies.  The  truncation  error  (e)  for  the  numerical 
solution  method  was  set  to  10-10.  This  error  bound  cannot  be  compared  directly  with  the  error  bounds  for 
the  statistical  solution  method,  but  the  performance  of  the  numerical  method  does  not  vary  much  with  the 
choice  of  e.  We  can  see,  clearly,  that  the  numerical  solution  method  is  faster  for  small  state  spaces,  but  that 
the  statistical  solution  method  scales  better  with  an  increase  in  the  size  of  the  state  space.  For  a  fixed  model 
size,  and  with  increasing  time  bound,  the  numerical  solution  method  compares  much  more  favorably.  The 
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(a)  Verification  time  as  a  function  of  state  space  size. 


(c)  Verification  time  as  a  function  of  state  space  size. 


(b)  Verification  time  as  a  function  of  time  bound. 


(d)  Verification  time  as  a  function  of  time  bound. 


Figure  6.14:  Comparison  of  numerical  and  statistical  probabilistic  model  checking  for  the  tandem  queuing  network 
(top)  and  the  symmetric  polling  system  (bottom).  For  the  statistical  solution  method,  results  are  shown  for  symmetric 
error  bounds  a  =  /3  equal  to  1CP8  (a)  and  10_1  (o). 


verification  time  remains  constant  once  steady-state  detection  kicks  in.  Still,  for  the  symmetric  polling  sta¬ 
tion,  the  verification  time  for  the  statistical  solution  remains  constant  for  large  time  bounds  as  well,  because 
all  sample  trajectories  are  terminated  prematurely  when  reaching  a  state  satisfying  poll  { . 

The  numerical  solution  method  has  the  same  asymptotic  time  complexity  for  verifying  a  UTSL  formula 
in  a  single  state  as  in  all  states  simultaneously  (Katoen  et  al.  2001).  This  is  a  great  benefit  when  dealing  with 
nested  probabilistic  operators.  Consider  the  UTSL  formula  V>  0.9  [P>  0.5  [o[°,r2l  c]  U'[0,T1^  x=n  A  y=n\  for 
the  robot  grid  world.  The  time  complexity  for  the  numerical  solution  method  is  0(q  ■  T\  ■  M  +  q  ■  72  •  M), 
which  is  essentially  the  same  as  0(q  ■  t\  ■  M)  for  T2  <  t\.  The  statistical  solution  method,  on  the  other 
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(a)  t2  =  9  (b)  t2  =  5 

Figure  6.15:  Comparison  of  numerical,  mixed,  and  statistical  solution  methods  for  formulae  with  nested  probabilistic 
operators.  For  the  statistical  and  mixed  solution  methods,  results  are  shown  for  S  equal  to  0.025  (a)  and  0.05  (v)- 

hand,  is  definitely  more  costly  in  the  presence  of  nested  probabilistic  operators.  Younes  et  al.  (2004)  have 
suggested  a  mixed  solution  method,  which  uses  the  numerical  approach  for  nested  probabilistic  operators 
and  the  statistical  approach  for  top-most  probabilistic  operators.  This  mixed  approach  shares  performance 
characteristics  of  both  solution  methods,  but  is  limited  by  memory  in  the  same  way  as  the  pure  numerical 
solution  method.  We  can  see  this  in  Figure  6.15,  where  we  compare  the  three  solution  methods  for  the  robot 
grid  world  case  study  using  two  different  values  of  T2-  For  72  =  9,  the  statistical  solution  method  is  slower 
for  state  spaces  up  to  106  or  107,  but  handles  much  larger  state  spaces  than  the  other  two  solution  methods 
without  running  out  of  memory.  For  T2  =  5,  the  nested  probabilistic  statement  is  false  in  all  states.  The  pure 
statistical  approach  benefits  from  this  fact  because  sample  trajectories  will  typically  not  extend  beyond  the 
initial  state.  The  numerical  and  mixed  solution  methods  scale  much  worse  in  this  case. 

In  summary,  the  empirical  results  presented  in  this  chapter  have  shown  that  the  performance  of  the 
statistical  solution  method  depends  on  several  factors,  in  particular  the  parameters  4,  a,  and  (3.  The  SPRT 
is  generally  orders  of  magnitude  faster  than  a  single  sampling  plan  with  the  same  strength,  although  there 
arc  exceptions  to  this  rule.  Memoization  is  important  for  making  a  pure  statistical  approach  tractable  in  the 
presence  of  nested  probabilistic  operators.  Numerical  solution  methods  arc  faster  than  statistical  methods 
for  smaller  state  spaces,  and  can  benefit  greatly  from  the  use  of  steady-state  detection,  but  statistical  methods 
scale  better  as  the  size  of  the  state  space  increases. 


Chapter  7 


Probabilistic  Verification  for 
“Black-Box”  Systems 

So  far,  we  have  assumed  that  a  model  is  available  of  the  system  that  we  want  to  verify.  Given  a  model, 
we  can  apply  either  numerical  or  statistical  solution  methods  for  probabilistic  model  checking.  Numerical 
techniques  provide  highly  accurate  results,  but  rely  on  strong  assumptions  regarding  the  dynamics  of  the 
systems  they  arc  used  to  analyze.  Statistical  techniques  require  only  that  the  dynamics  of  a  system  can  be 
simulated,  and  can  therefore  be  used  for  a  larger  class  of  stochastic  processes.  The  results  produced  by 
statistical  methods  arc  only  probabilistic,  however,  and  attaining  high  accuracy  tends  to  be  costly. 

For  some  systems,  it  may  not  even  be  feasible  to  assume  that  we  can  simulate  their  behavior.  Sen  et  al. 
(2004)  consider  the  verification  problem  for  such  “black-box”  systems.  Here,  “black-box”  means  that  the 
system  cannot  be  controlled  to  generate  execution  traces,  or  trajectories,  on  demand  stalling  from  arbitrary 
states.  This  is  a  reasonable  assumption,  for  instance,  for  a  system  that  has  already  been  deployed  and  for 
which  we  arc  given  only  a  set  of  trajectories  generated  during  actual  execution  of  the  system.  We  arc  then 
asked  to  verify  a  probabilistic  property  of  the  system  based  on  the  information  provided  to  us  as  a  fixed  set 
of  trajectories.  Statistical  solution  techniques  arc  certainly  required  to  solve  this  problem.  The  statistical 
method  described  in  Chapter  4  cannot  be  used  to  verify  “black-box”  systems,  however,  because  it  depends 
on  the  ability  to  generate  trajectories  on  demand. 

Sen  et  al.  (2004)  present  an  alternative  solution  method  for  verification  of  “black-box”  systems  based 
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on  statistical  hypothesis  testing  with  fixed  sample  sizes.  In  this  chapter,  we  improve  upon  their  algorithm  by 
making  sure  to  always  accept  the  most  likely  hypothesis,  and  we  correct  their  procedure  for  verifying  nested 
probabilistic  properties.  Differences  between  the  two  approaches  arc  discussed  in  detail  towards  the  end  of 
this  chapter. 

The  algorithm  we  present  for  verification  of  “black-box”  systems  can  handle  the  full  logic  UTSL,  in¬ 
cluding  properties  without  finite  time  bounds,  although  the  accuracy  of  the  result  for  such  properties  may 
be  poor.  Our  algorithm,  like  that  of  Sen  et  al.  (2004),  makes  no  guarantees  regarding  accuracy.  Instead 
of  respecting  some  a  priori  bounds  on  the  probability  of  error,  the  algorithm  computes  a  p- value  for  the 
result,  which  is  a  measure  of  confidence.  This  is  really  the  best  we  can  do,  provided  that  we  cannot  generate 
trajectories  for  the  system  as  we  see  fit  and  instead  arc  restricted  to  using  a  predetermined  set  of  trajectories. 

The  algorithm  presented  in  this  chapter  is  complementary  to  the  statistical  model  checking  algorithm 
presented  in  Chapter  5,  and  is  useful  under  different  assumptions.  If  we  cannot  generate  trajectories  for 
a  system  on  demand,  then  the  algorithm  presented  here  still  allows  us  to  reach  conclusions  regarding  the 
behavior  of  the  system.  If,  however,  we  have  a  model  of  a  system  so  that  we  can  simulate  its  dynamics,  then 
we  arc  better  off  with  the  approach  of  Chapter  5  as  it  gives  us  full  control  over  the  probability  of  obtaining 
an  incorrect  result. 

7.1  “Black-Box”  Probabilistic  Systems  and  Verification 

Formally,  we  define  a  “black-box”  probabilistic  system  in  terms  of  what  we  know  (or  rather,  do  not  know) 
regarding  the  probability  measure  over  sets  of  trajectories. 

Definition  7.1  (“Black-Box”  Probabilistic  System).  A  ‘  ‘black-box”  probabilistic  system  is  a  stochastic 
discrete  event  system  for  which  the  probability  measure  p  over  sets  of  trajectories  with  common  prefix  is 
not  fully  specified  and  cannot  be  sampled  from. 

We  thus  refer  to  a  stochastic  discrete  event  system  A4  as  a  “black-box”  system  if  we  lack  an  exact 
definition  of  the  probability  measure  p  over  sets  of  trajectories  of  A4.  We  assume  that  we  cannot  even 
sample  trajectories  according  to  p,  as  stated  in  Definition  7.1.  Thus,  in  order  to  solve  a  verification  problem 
(M,poi  41)  for  a  “black-box”  system  A4,  we  must  rely  on  an  external  source  to  provide  a  sample  set  of  n 
trajectories  for  A4  that  is  representative  of  the  probability  measure  p  and  the  initial  state  distribution  pq. 
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We  further  assume  that  we  arc  provided  only  with  truncated  trajectories,  because  infinite  trajectories  would 
require  infinite  memory  to  store. 

We  will  use  statistical  hypothesis  testing  to  verify  properties  of  a  “black-box”  system  given  a  sample  of 
n  truncated  trajectories.  Since  we  rely  on  statistical  techniques,  we  will  typically  not  know  with  certainty  if 
the  result  we  produce  is  correct.  The  method  we  present  for  verification  of  “black-box”  systems  computes 
a  p- value  for  a  verification  result,  which  is  a  value  in  the  interval  [0, 1]  with  values  closer  to  0  representing 
higher  confidence  in  the  result  and  a  p- value  of  0  representing  certainty  (Hogg  and  Craig  1978,  pp.  255-256). 
We  start  by  assuming  that  is  free  of  nested  probabilistic  operators.  Later  on,  we  consider  UTSL  formulae 
with  nested  probabilistic  operators,  which  just  as  with  regular  statistical  probabilistic  model  checking  cannot 
be  handled  in  a  meaningful  way  without  making  rather  strong  assumptions  regarding  the  dynamics  of  the 
“black-box”  system. 

7.1.1  Verification  without  Nested  Probabilistic  Operators 

Given  a  state  s,  verification  of  a  UTSL  formula  x  ~  v  is  trivial.  We  can  simply  read  the  value  assigned 
to  x  in  state  s  and  compare  it  to  v.  We  consider  the  remaining  three  cases  in  more  detail,  stalling  with 
the  probabilistic  operator  72xo!'  -  Recall  that  the  objective  is  to  produce  a  Boolean  result  annotated  with  a 
p-value. 

Probabilistic  Operator 

Consider  the  problem  of  verifying  the  UTSL  formula  TU  n  [<p]  in  state  s  of  a  stochastic  discrete  event  system 
A4.  As  before,  let  X,  be  a  random  variable  representing  the  verification  of  the  path  formula  <p  over  a 
trajectory  for  A4  drawn  according  to  the  probability  measure  /x(Path({(s,  0)})).  If  we  choose  X,  =  1  to 
represent  the  fact  that  ip  holds  over  a  random  trajectory,  and  Xr  =  0  to  represent  the  opposite  fact,  then 
X{  is  a  Bernoulli  variate  with  parameter  p  =  p({er  6  Path({(s,  0)})  |  er,  0  |=  <p}),  i.e.  Pr[2Q  =  1]  =  p 
and  Pr[Xi  =  0]  =  1  —  p.  In  order  to  verify  Vt<, q  [x] ,  we  can  make  observations  of  Xr  and  use  statistical 
hypothesis  testing  to  determine  if  p  rxi  6  is  likely  to  hold.  An  observation  of  Xt,  denoted  xt,  is  the  verification 
of  ip  over  a  specific  trajectory  <jj.  If  a,  satisfies  the  path  formula  <p,  then  x%  =  1,  otherwise  x,  =  0. 

In  our  case,  we  are  given  n  truncated  trajectories  for  a  “black-box”  system  that  we  can  use  to  generate 
observations  of  Xj.  Each  observation  is  obtained  by  verifying  the  path  formula  <p  over  one  of  the  truncated 
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trajectories.  This  is  straightforward  given  a  truncated  trajectory  {(sq,  to),  •  •  • ,  (sk-i,tk-i),  * /,■ } ,  provided 
that  tp  does  not  contain  any  probabilistic  operators.  For  p  =  X1  <f>,  we  just  check  if  to  G  /  and  si  |=  <I>. 
For  p  =  'I>  U1  T ,  wc  traverse  the  trajectory  until  we  find  a  state  s,  such  that  one  of  the  following  conditions 
holds,  whh  Tt  defined  as  in  (2.18)  to  be  the  time  at  which  state  st  is  entered: 

1.  {Si  |=  -n$)  A  ((Tj  i  I)  V  (Si  |=  -1*)) 

2.  (Tj  €  I)  A  (Sj  |=  ) 

3.  ((Tj,  Tj+1)  n  /  +  0)  A  (Sj  |=  $)  A  (Sj  j=  ) 

In  the  first  case,  $  U1  \D  does  not  hold  over  the  trajectory,  while  in  the  last  two  cases  the  time -bounded 
until  formula  does  hold.  This  is  the  same  procedure  as  was  used  in  Chapter  5  for  generating  observations 
for  the  verification  of  probabilistic  statements.  Note,  however,  that  in  this  case  we  may  not  always  be  able 
to  determine  the  value  of  p  over  all  trajectories  because  the  trajectories  that  arc  provided  to  us  arc  assumed 
to  be  truncated.  Previously,  we  assumed  that  we  could  always  generate  a  sufficient  prefix  of  a  trajectory  so 
that  the  truth  value  of  a  path  formula  could  be  determined. 

We  consider  the  case  V>  g  \p\  in  detail,  noting  that  V<  g  [p]  can  be  handled  in  the  same  way  simply  by 
reversing  the  value  of  each  observation.  We  want  to  test  the  hypothesis  Hq  :  p  >  9  against  the  alternative 
hypothesis  H\  :  p  <  0  by  using  the  n  observations  x\, ,  xn  of  the  Bernoulli  variates  X\, . . . ,  Xn.  To 
do  so,  we  specify  a  constant  c.  If  Y^i=\  xi  is  greater  than  c,  then  hypothesis  // o  is  accepted,  i.e.  V>g[p]  is 
determined  to  hold.  Otherwise,  if  the  given  sum  is  at  most  c,  then  hypothesis  H\  is  accepted,  meaning  that 
V>  g  [p\  is  determined  not  to  hold.  The  constant  c  should  be  chosen  so  that  it  becomes  roughly  equally  likely 
to  accept  Hq  as  H  \  if  p  equals  6.  The  pair  (n,  c )  is  a  single  sampling  plan,  as  described  in  Section  2.2. 

We  know  from  before  that  by  using  a  single  sampling  plan  (n,  c ),  we  accept  hypothesis  H±  with  prob¬ 
ability  F(c;  n,p),  and  consequently  hypothesis  H0  is  accepted  with  probability  1  —  F(c;  n,p).  Ideally,  we 
should  choose  c  such  that  F(c;  n ,  9)  =  0.5,  but  it  is  not  always  possible  to  attain  equality  because  the  bino¬ 
mial  distribution  is  a  discrete  distribution.  The  best  we  can  do  is  to  choose  c  such  that  |T(c;  n ,  9)  —  0.5|  is 
minimized.  We  can  readily  compute  the  desired  c  using  (2.3). 

We  now  have  a  way  to  decide  whether  to  accept  or  reject  the  hypothesis  that  V>g[<p\  holds,  but  we  also 
want  to  report  a  value  reflecting  the  confidence  in  our  decision.  For  this  purpose,  we  compute  the  p- value 
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for  a  decision.  The  p-value  is  defined  as  the  probability  of  the  sum  of  observations  being  at  least  as  extreme 
as  the  one  obtained  provided  that  the  hypothesis  that  was  not  accepted  holds.  The  p-value  for  accepting 
Hq  when  Y^i=\  xi  =  d  is  Pr[)T)"=1  Xt  >  d  \  p  <  9}  <  F(n  —  d;  n,  1  —  9)  =  1  —  F(d  —  1;  n,  9),  while 
the  p-value  for  accepting  H \  is  Pr[^''=1  X,  <  d  \  p  >  9]  <  F(d:  n.  9).  The  following  theorem  provides 
justification  for  our  choice  of  the  constant  c. 

Theorem  7.1  (Minimization  of  p-value).  By  choosing  c  to  minimize  \F(c:  n,  9)  —  0.5|  when  testing  H0  : 
p  >  9  against  H \  :  p  <  9  using  a  single  sampling  plan  (n,  c),  the  hypothesis  with  the  lowest  p-value  is 
always  accepted. 

Proof.  Hypothesis  H\  is  accepted  only  if  d  <  c,  which  means  that  the  p-value  for  H  \  under  these  circum¬ 
stances  is  at  most  F(c;  n,  9).  The  p-value  for  Hq  if  d  <  c  would  be  at  least  1  —  F(c—  1;  n,  9).  We  know  that 
F(c—  l;n,9)  <  F(c;  n,  9)  and  by  assumption  that  \F(c—  1;  n,  9)  —  0.5|  >  \F(c;  n,  9)  —  0.5|.  It  follows  that 
F(c;  n,  9)  <  1  —  F{c  —  1;  n,  9)  as  required.  For  d  >  c,  the  p-value  for  acceptance  of  H\  would  be  at  least 
F(c  +  1;  n,  9).  The  p-value  for  acceptance  of  Hq  when  d  >  c,  on  the  other  hand,  is  at  most  1  —  F(c;  n.  9). 
We  know  that  F(c+l;n,9)  >  F(c;  n,  9)  and  by  assumption  that  \F(c+ 1;  n,  9)  —  0.5|  >  |F(c;  n,  9)  —  0.5|. 
Consequently,  1  —  F(c;  n,  9)  <  F(c+ 1;  n.  9)  and  our  choice  of  c  ensures  that  the  hypothesis  with  the  lowest 
p-value  is  always  accepted.  □ 

In  practice,  it  is  unnecessary  to  compute  c.  It  is  more  convenient  simply  to  compute  the  p-value  of  each 
hypothesis  and  accept  the  hypothesis  with  the  lowest  p-value. 

Example  7.1.  Consider  the  problem  of  verifying  the  UTSL  formula  $  =  P>o.9  [^°’10°]  x=l]  in  a  state 
satisfying  x=0  for  a  “black-box”  system  that  in  reality  is  the  continuous-time  Markov  process  shown  in 
Figure  7.1.  The  probability  measure  of  trajectories  stalling  in  state  x=()  and  satisfying  O^0,100!  x=l  is 
1  —  e~  1  ~  0.63  for  this  system,  so  the  UTSL  formula  does  not  hold,  but  we  would  of  course  not  know 
this  unless  we  had  access  to  the  model.  Assume  that  we  are  given  a  set  of  100  truncated  trajectories, 
of  which  63  satisfy  the  path  formula  O0-  l00l  x=l  and  37  do  not  satisfy  the  given  path  formula.  Thus, 
n  =  100  and  d  =  63.  The  p-value  for  Hq  is  1  —  F{ 62;  100, 0.9)  «  1  —  10-13,  while  the  p-value  for  H\  is 
F( 63, 100,  0.9)  ~  5.48  •  10-13.  The  hypothesis  with  the  lowest  p-value  is  Hi,  so  we  conclude  that  <J>  does 


not  hold. 
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Figure  7.1:  A  simple  two-state  continuous-time  Markov  process. 

In  the  analysis  so  far  we  have  been  assuming  that  the  value  of  <p  can  be  determined  over  all  re  truncated 
trajectories.  Now,  consider  the  case  where  we  are  unable  to  verify  the  path  formula  ip  over  some  of  the 
re  truncated  trajectories.  This  would  happen  if  we  are  verifying  4>  U 1  T  over  a  trajectory  that  has  been 
truncated  before  either  -i<J>  V  T  is  satisfied  or  time  exceeds  all  values  in  I.  We  cannot  simply  ignore  such 
trajectories:  it  is  assumed  that  the  entire  set  of  re  trajectories  is  representative  of  the  measure  //,  but  the  subset 
of  truncated  trajectories  for  which  we  can  determine  the  value  of  ip  is  not  guaranteed  to  be  a  representative 
sample  for  this  measure. 

Example  7.2.  Consider  the  same  problem  as  in  Example  7.1.  Assume  that  we  are  provided  with  a  set  of 
100  truncated  trajectories  for  the  system,  and  that  all  trajectories  have  been  truncated  before  time  50.  Some 
of  these  trajectories,  on  average  roughly  39  in  every  100,  will  satisfy  the  path  formula  O^0,100!  x=l,  while 
the  remaining  truncated  trajectories  will  not  contain  sufficient  information  for  us  to  determine  the  validity 
of  the  path  formula  over  these  trajectories.  An  analysis  based  solely  on  the  trajectories  over  which  the 
path  formula  can  be  decisively  verified  would  be  severely  biased.  If  the  number  of  positive  observations 
is  exactly  39,  with  61  undetermined  observations,  we  would  wrongly  conclude  that  <h  holds  with  p- value 
1  —  F( 38;  39,  0.9)  ~  0.0164,  which  implies  a  fairly  high  confidence  in  the  result. 

Let  n!  be  the  number  of  observations  whose  value  we  can  determine  and  let  d!  be  the  sum  of  these  n! 
observations.  We  then  know  that  the  sum  of  all  observations,  d,  is  at  least  d'  and  at  most  d! +n—n' .  If  d'  >  c, 
then  hypothesis  Ho  can  be  safely  accepted.  Instead  of  a  single  p- value,  we  associate  an  interval  of  possible 
p-values  with  the  result:  [F(p!  —  d!\  n,  1  —  9),  F(n  —  d!\  n,  1  —  9)\.  Conversely,  if  d'  +  n  —  n'  <  c,  then 
hypothesis  H\  can  be  accepted  with  p- value  in  the  interval  [F(d'\ n,  9),F(d>  +  n  —  re/; re,  0)].  If,  however, 
d'  <  c  and  d'  +  n  —  n'  >  c,  then  it  is  not  clear  which  hypothesis  should  be  accepted.  We  could  in  this  case 
say  that  we  do  not  have  enough  information  to  make  an  informed  choice.  Alternatively,  we  could  accept 
one  of  the  hypotheses  with  its  associated  p- value  interval.  We  prefer  to  always  make  some  choice,  and  we 
recommend  choosing  Ho  if  F(n  —  d'\  re,  1  —9)  <  F{d'  +  re  —  re/:  re,  0)  and  H \  otherwise.  This  strategy 
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minimizes  the  maximum  possible  p-value.  Alternatively,  we  could  minimize  the  minimum  possible  p-value 
by  instead  choosing  H0  if  F(n'  —  (l'\  n.  I  -6)  <  F(d ;  n,  9)  and  H \  otherwise.  Note  that  this  way  of  treating 
truncated  trajectories  makes  our  approach  work  even  for  unbounded  until  formulae  d>  U  dA  although  we 
would  typically  expect  the  result  to  be  highly  uncertain  for  such  formulae. 

Example  7.3.  Consider  the  same  situation  as  in  Example  7.2,  with  39  positive  and  61  undetermined  obser¬ 
vations.  The  p-value  for  accepting  the  UTSL  formula  $  =  V>  o,y  [O  l0(J  x=  l]  as  true  lies  in  the  interval 
[F( 0;  100,  0.1),  F(61, 100,  0.1)]  ~  [2.65  •  10“5, 1  —  3.77  •  10-15].  For  the  opposite  decision,  we  get  the 
p-value  interval  [F(39;  100,  0.9),  F(100;  100,  0.9)]  «  [1.59  •  10“35, 1],  Both  intervals  are  almost  equally 
uninformative,  so  no  matter  what  decision  we  make,  we  will  have  a  high  uncertainty  in  the  result.  We  would 
accept  <I>  as  true  if  we  prefer  to  minimize  the  maximum  possible  p-value,  and  we  would  reject  $  as  false  if 
we  instead  prefer  to  minimize  the  minimum  possible  p-value,  but  in  both  cases  we  have  a  maximum  p-value 
well  above  0.5.  This  is  in  sharp  contrast  to  the  faulty  analysis  suggested  in  Example  7.2,  which  led  to  an 
acceptance  of  d>  as  true  with  a  low  p-value. 

Composite  State  Formulae 

To  verify  -id?,  we  first  verify  <I>.  If  we  conclude  that  d?  has  a  certain  truth  value  with  p-value  pv,  then  we 
conclude  that  ->d>  has  the  opposite  truth  value  with  the  same  p-value.  To  motivate  this,  consider  the  case 
~>V> o [<p] •  To  verify  V>$[<p\,  we  test  the  hypothesis  Ho  :  p  >  9  against  //]  :  p  <  9  as  stated  above. 
Note,  however,  that  -> V>o[ip\  =  "P<  o[p\,  which  could  be  posed  as  the  problem  of  testing  the  hypothesis 
H':p  <  9  against  H[  :  p  >  9.  Since  H'0  =  II  \  and  //[  =  Hq,  we  can  simply  negate  the  result  of  verifying 
V>e[<p\  while  maintaining  the  same  p-value. 

For  a  conjunction  T  A  T,  wc  have  to  consider  four  cases.  First,  if  we  verify  <1>  to  hold  with  p-value  pvq, 
and  T'  to  hold  with  p-value  pv#,  then  we  conclude  that  A  'k  holds  with  p-value  max(pn$,  pv^).  Second, 
if  we  verify  <b  not  to  hold  with  p-value  pv$,  while  verifying  that  T  holds,  then  we  conclude  that  <I>  A  T  does 
not  hold  with  p-value  pv§.  The  third  case  is  analogous  to  the  second  with  <h  and  dr  interchanged.  Finally,  if 
we  verify  <J>  not  to  hold  with  p-value  pv$  and  dr  not  to  hold  with  p-value  pv  vT) ,  then  we  conclude  that  <I>  A  dr 
does  not  hold  with  p-value  min {pv^^pv-^,).  This  is  similar  to  the  result  of  Theorem  5.4,  but  for  p-values 
instead  of  bounds  on  the  type  I  and  II  error  probabilities. 

Before  proving  the  results  above,  let  us  give  an  intuitive  justification.  In  order  for  4>  A  d;  to  hold,  both  d> 
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and  'F  must  hold,  so  we  cannot  be  any  more  confident  in  the  result  for  <I>  A  'F  than  we  arc  in  the  result  for  the 
individual  conjuncts,  thus  the  maximum  in  the  first  case.  To  conclude  that  <I>  A  T  does  not  hold,  however, 
we  only  need  to  be  convinced  that  one  of  the  conjuncts  does  not  hold.  In  case  we  think  exactly  one  of  the 
conjuncts  holds,  then  the  result  for  the  conjunction  will  be  based  solely  on  this  conviction  and  the  p-value 
for  the  conjunct  we  think  holds  should  not  matter.  This  covers  the  second  and  third  cases.  In  the  fourth  case, 
we  have  two  sources  (not  necessarily  independent)  telling  us  that  the  conjunction  is  false.  We  therefore  have 
no  reason  to  be  less  confident  in  the  result  for  the  conjunction  than  in  the  result  for  each  of  the  conjuncts, 
hence  the  minimum  in  this  case. 

For  a  mathematical  derivation  of  the  given  expressions,  we  consider  the  formula  V>  o,  [<pi]  A  V>  q2  [<£2]- 
Let  d,  denote  the  number  of  trajectories  that  satisfy  p%.  Provided  we  accept  the  conjunction  as  true,  which 
means  we  accept  each  conjunct  as  true,  the  p- value  for  this  result  is 

n  n 

(7. 1)  Pr[^2  XP  >  di  A  ^2  xf]  >d2\pl<OlVp2<  e2]  . 

2—1  2—1 

To  compute  this  p-value,  consider  the  three  ways  in  which  p\  <  9\  V  p2  <  d2  can  be  satisfied  (cf.  Sen  et  al. 
2004).  We  know  from  elementary  probability  theory  (Lemma  5.3)  that 

(7.2)  Pr[A  A  B]  <  min(Pr[7l],  Pr[i?]) 

for  arbitrary  events  A  and  B.  From  this  fact,  and  assuming  that  pv,  is  the  p- value  associated  with  the 
verification  result  for  V>  o,  [<Pi] ,  we  derive  the  following: 

1-  PrE”=i  x\l)  >dif\  YJi=i  xi2)  >d2\pi<91Ap2<  e2\  <  min (pv1,pv2) 

2-  Pr[E”=i  x\1]  >diA  Yh=\  x\2)  >d2\pi<6l/\p2>  e2]  <  min (pn1;  1)  =  pv1 

3-  Pr[E”=i  xil)  >diA  YZ=  i  x(2)  >d2\pi>6iAp2<  02]  <  min(l ,pv2)  =  pv2 

We  take  the  maximum  over  these  three  cases  to  obtain  a  bound  for  (7.1),  which  gives  us  rna x(pv1,pv2). 

For  the  same  formula,  but  now  assuming  we  have  verified  both  conjuncts  to  be  false,  we  compute  the 
p-value  as 

n  n 

(7.3)  Pr[^2  xt1]  <  di  A  ^  xi2)  <  d2  |  Pl  >  6X  A  p2  >  02]  ■ 

2—1  2—1 

It  follows  immediately  from  (7.2)  that  min (pv1,pv2)  is  a  bound  for  (7.3),  which  is  the  desired  result. 
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7.1.2  Verification  with  Nested  Probabilistic  Operators 

If  we  allow  nested  probabilistic  operators,  verification  of  UTSL  formulae  for  “black-box”  stochastic  discrete 
event  systems  becomes  much  harder.  Consider  the  formula  V>e  [O^0- l00'  V>g'[p]] .  In  order  to  verify  this 
formula,  we  must  test  if  V>$>[(p\  holds  at  some  time  t  G  [0, 100]  along  the  set  of  trajectories  that  we  arc 
given.  Unless  the  time  domain  T  is  such  that  there  is  a  finite  number  of  time  points  in  a  finite  interval,  then 
we  potentially  have  to  verify  V>  o<  [p\  at  an  infinite  or  even  uncountable  number  of  points  along  a  trajectory, 
which  clearly  is  infeasible.  We  made  the  same  observation  regarding  verification  of  systems  for  which  we 
can  generate  trajectories  on  demand.  The  situation  is  even  worse,  however,  for  “black-box”  systems.  Even 
if  T  =  Z*,  so  that  we  only  have  to  verify  nested  probabilistic  formulae  at  a  finite  number  of  points,  we 
still  have  to  take  the  entire  prefix  of  the  trajectory  into  account  at  each  time  point.  We  arc  given  a  fixed 
set  of  trajectories,  and  we  can  use  only  the  subset  of  trajectories  with  a  matching  prefix  to  verify  a  nested 
probabilistic  formula.  It  is  thus  likely  that  we  will  have  few  trajectories  available  to  use  for  verifying  nested 
probabilistic  formulae.  In  the  worst  case,  there  will  be  only  a  single  matching  prefix,  in  which  case  the 
uncertainty  in  the  result  will  be  overwhelming. 

We  can  get  around  this  problem  by  assuming  that  the  “black-box”  system  is  a  Markov  process.  Under 
the  Markov  assumption,  as  mentioned  earlier,  we  only  have  to  take  the  last  state  along  a  trajectory  prefix 
into  consideration.  Consequently,  any  suffix  of  a  truncated  trajectory  starting  at  a  specific  state  s  can  be 
regarded  as  representative  of  the  probability  measure  0) }).  This  makes  more  trajectories  available 

for  the  verification  of  nested  probabilistic  formulae. 

Another  complicating  factor  in  the  verification  of  V>  o  [<p\  >  where  p  contains  nested  probabilistic  opera¬ 
tors,  is  that  we  cannot  verify  p  over  trajectories  without  some  uncertainty  in  the  result.  This  means  that  we 
no  longer  obtain  observations  of  the  random  variables  Xt,  as  defined  above,  but  instead  we  observe  some 
other  random  variables  Yt,  related  to  A*  through  bounds  on  the  observation  error. 

To  compute  a  p- value  for  nested  verification,  we  assume  that  Pr[Yj  =  0  |  A,  =  1]  <  a  and  Pr[Yj  = 
1  |  Aj  =  0]  <  ft.  We  can  make  this  assumption  if  we  introduce  indifference  regions  in  the  verification 
of  nested  probabilistic  formulae  and  use  the  procedure  described  in  Chapter  5  to  verify  path  formulae  over 
truncated  trajectories.  By  Lemma  5.7,  we  have  the  following  bounds:  p(l  —  a)  <  Pr[Yj  =  1]  <  1  —  (1  — 
p)(l  —  ft).  The  p-value  for  accepting  V>g[p\  as  true  when  the  sum  of  the  observations  is  d  is  Pr[^”=1  Yr  > 
d  |  p  <  9]  <  F(n  —  d ;  n,  (1  —  9)(  1  —  ft)).  The  p-value  for  the  opposite  decision  is  Pr[^”=1  Yi  <  d  \  p  > 
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6]  <  F{d ;  n,  9(  1  —  a)).  Since  F(d;  n,p )  increases  as  p  decreases,  we  see  that  the  p-value  increases  as  the 
error  bounds  a  and  (5  increase,  which  makes  perfect  sense.  As  was  suggested  earlier,  we  can  minimize  the 
p- value  of  the  verification  result  by  computing  the  p- values  of  both  hypotheses  and  accept  the  one  with  the 
lowest  p-value. 

We  can  let  the  user  specify  a  parameter  ho  that  controls  the  relative  width  of  the  indifference  regions.  A 
nested  probabilistic  formula  V>e[+\  is  verified  with  indifference  region  of  half-width  8  =  8q9  if  9  <  0.5 
and  5  =  h'o(l  —  0)  otherwise.  The  verification  is  carried  out  using  acceptance  sampling  as  before,  but 
with  hypotheses  Ho  :  p  >  9  +  8  and  H\  :  p  <  9  —  8.  Instead  of  reporting  a  p- value,  as  is  done  for 
top-level  probabilistic  operators,  we  report  bounds  for  the  type  I  error  probability  of  the  sampling  plan  in 
use  if  H\  is  accepted  and  the  type  II  error  probability  if  Ho  is  accepted.  In  our  case,  assuming  a  sampling 
plan  (n,  c)  is  used,  the  type  I  error  bound  is  1  —  F(c\  n,  9  +  8 )  and  the  type  II  error  bound  is  F(c:  n,  9  —  8). 
The  difference  from  the  procedure  described  in  Chapter  5  is  that  we  compute  the  error  bounds  that  we  can 
achieve  for  subformulae  with  a  fixed  sample  size  instead  of  computing  the  sample  size  required  to  achieve 
certain  error  bounds.  We  can  then  use  Theorem  5.4  to  compute  error  bounds  for  composite  UTSL  formulae 
and  path  formulae  with  an  until  operator.  As  error  bounds  for  the  computation  of  the  p- value  for  a  top-level 
probabilistic  operator,  we  simply  take  the  maximum  error  bounds  for  the  verification  of  the  path  formula 
over  all  trajectories. 


7.2  Comparison  with  Related  Work 

The  idea  of  using  statistical  hypothesis  testing  for  verification  of  “black-box”  systems  was  first  proposed  by 
Sen  et  al.  (2004).  This  section  highlights  the  differences  between  their  approach  and  the  approach  presented 
in  this  chapter. 

First,  consider  the  verification  of  a  probabilistic  formula  V>o[ip\.  Our  approach  is  essentially  the  same 
as  theirs:  given  a  constant  c,  accept  if  Y17=  l  >  c  and  reject  otherwise.  Their  choice  of  c  is  different, 
however,  and  is  essentially  based  on  De  Moivre’s  (1738)  normal  approximation  for  the  binomial  distribution. 
Their  acceptance  condition  is  Y^i= t  A  nd,  which  corresponds  to  choosing  c  to  be  \n&\  —  1.  The  mean 
of  the  binomial  distribution  B(n ,  9)  is  n9,  so  this  would  be  the  right  thing  to  do  if  i  can  be  assumed 
to  have  a  normal  distribution.  De  Moivre  showed  that  this  is  approximately  the  case  for  large  n  if  Xr 
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arc  Bernoulli  variates,  but  the  approximation  is  poor  for  moderate  values  of  n  or  if  6  is  not  close  to  0.5. 
Their  algorithm,  as  a  consequence,  will  under  some  circumstances  accept  a  hypothesis  with  a  larger  p-  value 
than  the  alternative  hypothesis.  By  choosing  c  as  we  do,  without  relying  on  the  normal  approximation,  we 
guarantee  that  the  hypothesis  with  the  smallest  p- value  is  always  accepted  (Theorem  7. 1).  Consider  the 
formula  V>o.oi  [y>],  for  example,  with  n  =  501  and  d  =  5.  Our  procedure  would  accept  the  formula  as 
true  with  p-value  0.562,  while  the  the  algorithm  of  Sen  et  al.  would  reject  the  formula  as  false  with  p-value 
0.614.  The  difference  is  not  of  great  significance,  but  it  is  still  worth  pointing  out  because  it  demonstrates  the 
danger  of  using  the  normal  approximation  for  the  binomial  distribution.  With  today’s  fast  digital  computers, 
it  is  hai'd  to  motivate  using  this  assumption. 

The  second  improvement  over  the  method  presented  by  Sen  et  al.  is  in  the  calculation  of  the  p- value  for 
the  verification  of  a  conjunction  $  A4  when  both  conjuncts  have  been  verified  to  be  false.  They  state  that 
the  p-value  is  pv$  +  pv but  this  is  too  conservative.  There  is  no  reason  to  believe  that  the  confidence  in 
the  result  for  <1>  A  T  would  be  lower  (i.e.  the  p-value  higher )  if  we  are  convinced  that  both  conjuncts  are 
false.  We  have  shown  that  the  p-value  in  this  case  is  bounded  by  min(pv$,  pv^),  which  intuitively  makes 
more  sense. 

Sen  et  al.,  in  their  handling  of  nested  probabilistic  operators,  confuse  the  p-value  with  the  probability 
of  accepting  a  false  hypothesis  (generally  referred  to  as  the  type  I  or  type  II  error  of  a  sampling  plan). 
The  p-value  is  not  a  bound  on  the  probability  of  a  certain  test  procedure  accepting  a  false  hypothesis.  In 
fact,  the  test  that  both  they  and  we  use  does  not  provide  a  useful  bound  on  the  probability  of  accepting  a 
false  hypothesis.  Their  analysis  relies  heavily  on  the  ability  to  bound  the  probability  of  accepting  a  false 
hypothesis,  and  we  have  presented  a  way  to  provide  such  bounds  by  introducing  indifference  regions  for 
nested  probabilistic  operators. 

In  addition.  Sen  et  al.  are  vague  regarding  the  assumptions  needed  for  their  approach  to  produce  reliable 
answers.  The  fact  that  they  treat  any  portion  of  a  trajectory  starting  in  s,  regardless  of  the  portion  preceding 
s,  as  a  sample  from  the  same  distribution,  hides  a  rather  strong  assumption  regarding  the  dynamics  of  their 
“black-box”  systems.  As  we  have  pointed  out,  this  is  not  a  valid  assumption  unless  we  know  that  the  system 
is  a  Markov  process.  It  also  appeal's  as  if  they  consider  only  truncated  trajectories  over  which  they  can  fully 
verify  a  path  formula,  and  this  can  introduce  a  bias  that  very  well  may  invalidate  the  conclusion  reached 
regarding  the  truth  value  of  a  probabilistic  formula.  We  have  made  this  clear  in  our  exposition,  and  we  have 
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presented  a  sound  procedure  for  handling  the  fact  that  the  value  of  a  path  formula  may  not  be  determined 
over  all  the  truncated  trajectories. 

Finally,  the  empirical  analysis  offered  by  Sen  et  al.  gives  the  reader  the  impression  that  a  certain  p- value 
can  be  guaranteed  for  a  verification  result  simply  by  increasing  the  sample  size.  This  violates  the  premise  of 
a  “black-box”  system  stated  by  the  authors  themselves  earlier  in  their  paper,  namely  that  trajectories  cannot 
be  generated  on  demand.  More  important,  though,  is  the  fact  that  a  certain  p- value  can  never  be  guaranteed. 
The  p-value  is  not  a  property  of  a  test,  but  simply  a  function  of  a  specific  set  of  observations.  If  we  arc 
unlucky,  we  may  make  observations  that  give  us  a  large  p- value  even  in  cases  when  this  is  unlikely.  It 
is  therefore  misleading  to  say  that  an  algorithm  for  “black-box”  verification  is  “faster”  than  the  statistical 
model  checking  algorithm  described  in  Chapter  5,  as  the  latter  algorithm  is  designed  to  realize  certain  a 
priori  performance  characteristics.  The  empirical  results  of  Sen  et  al.  cannot,  in  fact,  be  replicated  reliably 
because  there  is  no  fixed  procedure  by  which  one  can  determine  the  sample  size  required  to  achieve  a  certain 
p- value.  Their  results  give  the  false  impression  that  their  procedure  is  sequential,  i.e.  that  the  sample  size 
automatically  adjusts  to  the  difficulty  of  attaining  a  certain  p- value,  when  in  reality  they  selected  the  reported 
sample  sizes  manually  based  on  prior  empirical  testing  (K.  Sen,  personal  communication.  May  20,  2004). 


Part  II 

Planning 
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Goal  Directed  Planning 


We  now  turn  to  the  problem  of  planning  for  stochastic  systems  with  asynchronous  events  and  actions.  In 
this  chapter,  we  consider  goal  directed  planning  problems.  We  propose  the  use  of  UTSL  as  a  formalism 
for  specifying  plan  objectives,  and  we  present  a  general  planning  framework  based  on  the  Generate,  Test 
and  Debug  (GTD)  paradigm  (Simmons  1988).  The  goal  is  to  generate  a  stationary  policy,  i.e.  a  mapping 
from  states  to  actions,  that  satisfies  a  UTSL  goal  condition.  To  handle  the  complexity  of  asynchronous 
events  with  general  delay  distributions,  we  resort  to  statistical  techniques.  We  use  the  statistical  approach 
for  UTSL  model  checking,  presented  in  Chapter  5,  to  verify  policies.  During  the  verification  phase,  sample 
trajectories  arc  generated,  which  can  then  be  analyzed  to  find  reasons  for  why  a  policy  fails  to  satisfy  the 
goal  condition.  The  result  of  this  analysis  is  used  to  guide  policy  debugging. 

We  use  a  deterministic  temporal  planner  to  help  generate  the  policies.  A  probabilistic  planning  problem 
is  transformed  into  a  deterministic  problem  by  making  every  possible  outcome  of  events  and  actions  avail¬ 
able  to  the  planning  system.  The  solution  is  a  deterministic  plan,  from  which  a  policy  is  generated  through 
decision  tree  learning.  This  policy  is  typically  overly  optimistic,  and  the  sample  trajectories  obtained  during 
policy  verification  arc  used  to  restrict  the  subsequent  choices  that  the  planning  system  can  make. 

8.1  Planning  Framework 

We  present  a  general  framework  for  goal  directed  probabilistic  planning  with  asynchronous  events,  based 
on  the  Generate,  Test  and  Debug  (GTD)  paradigm  proposed  by  Simmons  (1988).  The  domain  model  is  a 
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Find-Policy  (M,  s0,  <j>) 

7 r0  <=  Generate-Initial-Policy(A1,  so,  4>) 
if  Test-Policy  (M,  s0,  <j>,  7r0)  then 
return  7To 
else 

7T  *\=  7Tq 

loop  [>  return  7 r  on  interrupt 

7 r'  <*=  Debug-Policy (M,  so,  <t>,  7t) 
if  Test-Policy  (M,  s0,  (j>,  i r')  then 
return  7 r' 

7 r  •<=  Better-Policy (Al,  s0, 77,77') 


Algorithm  8.1:  Generic  planning  algorithm  for  probabilistic  planning  based  on  the  GTD  paradigm. 


continuous-time  stochastic  discrete  event  system,  and  policies  arc  generated  to  satisfy  properties  specified 
as  UTSL  formulae  (Chapter  4).  The  approach  resembles  that  of  Drummond  and  Bresina  (1990)  for  proba¬ 
bilistic  planning  in  discrete-time  domains.  Both  approaches  use  temporal  logic  to  express  goal  conditions, 
and  goal  conformance  is  achieved  through  incremental  plan  modification. 

At  the  core  of  the  framework  is  a  generic  hill-climbing  procedure,  Find-Policy,  shown  as  Algo¬ 
rithm  8.1.  The  input  to  the  procedure  is  a  model  A4  of  a  stochastic  discrete  event  system,  an  initial  state 
so,  and  a  UTSL  goal  condition  tf>.  The  result  is  a  policy  77  such  that  the  stochastic  process  A4  [77]  (i.e.  A4 
controlled  by  77)  satisfies  4>  when  execution  starts  in  a  state  so- 

The  procedure  Generate-Initial-Policy  returns  a  seed  policy  for  the  policy  search  algorithm.  In 
Section  8.2,  we  describe  in  detail  how  to  implement  this  procedure  using  an  existing  deterministic  temporal 
planner.  Test-Policy  returns  true  if  the  current  policy  satisfies  the  goal  condition,  and  returns  false  if  the 
goal  condition  is  violated.  This  amounts  to  solving  the  UTSL  model  checking  problem  (M  [77] ,  so ,  </>) ,  which 
can  be  done  using  existing  numerical  solution  methods  or  the  statistical  solution  technique  presented  in 
Chapter  5.  Debug-Policy  is  responsible  for  debugging  the  current  policy  and  returning  a  new  policy.  If  the 
new  policy  still  does  not  satisfy  the  goal  condition,  then  we  retain  the  better  of  the  two  policies,  as  determined 
by  Better-Policy,  and  continue  until  a  satisfactory  policy  is  found  or  the  search  is  interrupted. 

In  the  work  presented  here,  it  is  essential  that  Test-Policy  uses  a  statistical  approach,  because  our 
implementation  of  Debug-Policy  relies  on  the  sample  trajectories  that  are  produced  during  policy  verifi¬ 
cation  for  its  failure  analysis.  Debug-Policy  analyses  the  sample  trajectories  to  find  reasons  why  the  goal 
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Goal  description 

UTSL  Formula 

reach  office  with  probability  at  least  0.9 

P>0.9[O  office] 

reach  office  within  17  time  units  with  probability  at  least  0.9 

V>  0.9  [O[0’171  office] 

reach  office  within  17  time  units  with  probability  at  least  0.9  while 
not  spilling  coffee 

V>  0.9  [-^coffee- spilled,  U^’17 1  office] 

reach  office  within  17  time  units  with  probability  at  least  0.9  while 
maintaining  at  least  a  0.5  probability  of  eventually  recharging 

V>o.9[V>0.5[0  recharging]  W[0’17]  office] 

remain  stable  for  at  least  8.2  time  units  with  probability  at  least  0.7 

V>  o.7[a10'8'21  stable] 

Table  8.1:  Examples  of  goals  expressible  as  UTSL  formulae. 


condition  is  violated,  attempts  to  debug  the  current  policy  based  on  the  outcome  of  the  failure  analysis,  and 
returns  a  new  policy. 

The  model  A4  is  assumed  to  be  a  stochastic  discrete  event  system  with  state  space  S  and  event  set  E. 
We  associate  an  enabling  condition,  <f>e,  with  each  event  e  €  E.  In  state  s,  events  Es  =  {e  6  E  \  s  |=  4>s} 
arc  enabled  and  race  to  trigger.  The  event  that  triggers  first  causes  a  state  transition  to  occur.  For  most  of 
this  chapter,  we  will  assume  that  the  model  is  a  GSMP  (Section  2.3.3).  Algorithm  8.1  does  not  rely  on 
this  assumption — it  can  be  made  to  work  for  arbitrary  stochastic  discrete  event  systems,  but  we  will  exploit 
the  probability  structure  imposed  by  a  GSMP  model  to  guide  the  generation  of  an  initial  policy  and  the 
subsequent  debugging  of  unsatisfactory  policies. 

A  decision  dimension  is  added  to  the  domain  model  by  identifying  a  set  A  C  E  of  actions  (controllable 
events)  that  can  be  disabled  at  will.  A  policy  n  is  used  to  determine  which  actions  should  be  enabled  in  any 
given  situation.  We  restrict  our  attention  to  stationary  policies,  which  arc  mappings  from  states  to  actions. 
A  model  M.  controlled  by  a  stationary  policy  n  is  a  stochastic  discrete  event  system  A4  [7r]  with  events 
{e  €  E  |  (s  J=  (j)e)  A  (e  £  A  — »  e  =  vr(s))}  enabled  in  state  s.  We  can  choose  to  be  idle  (i.e.  have  no 
action  enabled)  in  a  state.  A  special  action,  ae,  is  used  to  represent  idleness  and  has  an  enabling  condition 
that  is  always  true. 

We  use  a  subset  of  UTSL  to  express  plan  objectives,  consisting  of  formulae  of  the  form  V^o  [$  U1  T] 
and  formulae  that  can  be  transformed  to  this  form,  such  as  q  [U/  <I>] .  A  wide  variety  of  goals  can  be 
expressed  with  this  subset  of  UTSL.  Table  8. 1  shows  examples  of  achievement  goals,  goals  with  safety 
constraints  on  execution  paths,  and  maintenance/prevention  goals.  We  limit  our  attention  to  goal  formulae 
with  finite  time  bounds. 
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8.2  Initial  Policy  Generation 

Given  a  planning  problem  (Ai,  so,  <j>),  we  want  to  find  a  stationary  policy  ir  :  S  — »  Ea  such  that  Ai[n}.  so  |= 
([>.  Algorithm  8.1  outlines  a  procedure  for  finding  such  a  policy  by  means  of  local  search.  The  efficiency  of 
the  procedure  will  depend  on  the  quality  of  the  initial  policy  returned  by  Generate-Initial-Policy.  A 
quick  solution  would  be  to  simply  return  the  null -policy  mapping  every  state  to  the  idle  action  ae,  but  this 
ignores  the  goal  condition  of  the  planning  problem.  If  we  can  make  a  more  informed  choice  for  an  initial 
policy,  it  is  likely  to  have  fewer  bugs  than  the  null-policy,  thus  requiring  fewer  repairs. 

We  present  an  implementation  of  Generate-Initial-Policy  that  relaxes  the  original  planning  prob¬ 
lem  by  ignoring  uncertainty  and  solves  the  resulting  deterministic  planning  problem  using  an  existing  tem¬ 
poral  planner.  Our  implementation  uses  a  slightly  modified  version  of  VHPOP  (Younes  and  Simmons 
2003),  a  heuristic  partial  order  causal  link  (POCL)  planner  with  support  for  PDDL2.1  durative  actions  (Fox 
and  Long  2003). 

8.2.1  Conversion  to  Deterministic  Planning  Problem 

We  assume  a  GSMP  model.  This  means  that  a  distribution  Ge  is  associated  with  each  event  e  governing  the 
time  from  when  e  becomes  enabled  until  it  triggers,  provided  e  remains  continuously  enabled  during  that 
time  period.  At  the  triggering  of  event  e  in  state  s,  the  next  state  is  determined  by  a  probability  distribution 
pe(-;  s ).  If  we  have  a  factored  representation  of  the  state  space,  with  Boolean  state  variables  V,  then  the 
distribution  pe(-;  s )  can  be  represented  implicitly  by  an  effect  formula  effe  using  the  formalism  presented  by 
Rintanen  (2003).  Effects  are  recursively  defined  as  follows: 

1 .  T  is  the  null-effect. 

2.  b  and  -i b  are  effects  if  b  £  V  is  a  Boolean  state  variable. 

3.  effi  A  •  •  •  A  effn  is  an  effect  if  eff\  through  effn  are  effects. 

4.  c\>  eff  is  an  effect  if  c  is  a  formula  over  V  and  eff  is  an  effect. 

5.  P\  effi  |  •  •  •  \pn  effn  is  an  effect  if  eff\  through  effn  are  effects,  pt  >  0  for  all  i  £  {1, . . .  ,n},  and 

EIU  Pi  =  i. 
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The  language  PPDDL+,  described  in  Appendix  B,  uses  this  representation.  Younes  and  Littman  (2004) 
describe  how  to  compute  an  explicit  representation  of  pe(-'.  s )  from  an  effect  formula. 

We  relax  a  temporal  probabilistic  planning  problem  by  treating  all  events  of  a  model  equally,  ignoring 
the  fact  that  some  events  arc  not  controllable.  In  other  words,  all  events  arc  considered  to  be  actions  that 
the  deterministic  planner  can  choose  to  include  in  a  plan.  We  eliminate  probabilistic  effects  by  splitting 
events  with  probabilistic  effects  into  multiple  events  with  deterministic  effects.  Each  new  event  has  the 
same  enabling  condition  as  the  original  event  and  an  effect  representing  a  separate  outcome  of  the  original 
event’s  probabilistic  effect.  An  event  with  probabilistic  effect  p\  eff\  | . . .  \pn  effn  is  split  into  n  events,  the  ith 
event  having  deterministic  effect  eff, . 1  Furthermore,  instead  of  a  probability  distribution  over  possible  event 
durations,  we  associate  an  interval  with  each  event  representing  the  possible  durations  for  the  event.  This 
interval  is  simply  the  support  of  the  probability  distribution  for  the  event  delay.  The  deterministic  temporal 
planner  is  permitted  to  select  any  duration  within  the  given  interval  for  an  event  that  is  paid  of  a  plan.  In  the 
next  section,  when  we  discuss  policy  debugging,  we  consider  ways  of  constraining  the  choice  of  action  and 
event  durations  based  on  information  gathered  during  the  verification  phase. 

With  these  transformations,  each  event  can  be  represented  as  one  or  more  PDDL2.1  durative  actions 
with  interval  constraints  on  the  duration,  with  the  enabling  condition  of  the  event  as  a  condition  that  must 
hold  over  the  entire  duration  of  the  action,  and  with  the  effect  associated  with  the  end  of  the  durative  action. 
Figure  8.1  shows  a  stochastic  event  with  delay  distribution  U{ 0, 10)  and  a  probabilistic  effect  with  two 
outcomes,  and  the  two  durative  actions  with  deterministic  effects  that  are  used  to  represent  the  stochastic 
event.  The  purpose  of  the  transformation  is  to  make  every  possible  outcome  of  a  stochastic  event  available 
to  the  deterministic  planner. 

A  UTSF  goal  condition  of  the  form  V>p  [<I>  U^r'T'  T]  is  converted  into  a  goal  for  the  deterministic 
planning  problem  as  follows.  We  make  T  a  goal  condition  that  must  become  true  some  time  between  r  and 
t'  time  units  after  the  start  of  the  plan,  while  <I>  becomes  an  invariant  condition  that  must  hold  until  T  is 
satisfied.  We  can  represent  this  goal  in  the  temporal  POCF  framework  as  a  durative  action  with  no  effects, 
with  an  invariant  condition  <I>  that  must  hold  over  the  duration  of  the  action,  and  a  condition  T  associated 

'Nested probabilistic  effects  may  require  further  splitting.  Any  effect  formula  can  be  transformed  to  the  form pieffi  \  . . .  \pneffn, 
where  effi  is  a  deterministic  effect,  although  this  may  result  in  an  exponential  increase  in  the  size  of  the  effect  formula  (Rintanen 
2003). 
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(:delayed-event  crash 

:delay  (uniform  0  10) 

:condition  (up) 

:effect  (probabilistic  0.4  (down)  0.6  (broken))) 
(:durative-action  crash  1 

duration  (and  (>=  ?duration  0)  (<=  ?duration  10)) 
[condition  (and  (at  staid  (up))  (over  all  (up))  (at  end  (up))) 
:  effect  (at  end  (down))) 

(:durative-action  crash2 

: duration  (and  (>=  ?duration  0)  (<=  ?duration  10)) 
:condition  (and  (at  staid  (up))  (over  all  (up))  (at  end  (up))) 
:  effect  (at  end  (broken))) 


Figure  8.1:  A  stochastic  event  (top)  and  two  durative  deterministic  actions  (bottom)  representing  the  stochastic  event. 

with  the  end  of  the  action.  We  add  the  temporal  constraints  that  the  staid  of  the  goal  action  must  be  scheduled 
at  time  0  and  that  the  end  of  the  action  must  be  scheduled  in  the  interval  [r,  t'].  VHPOP  records  all  such 
temporal  constraints  in  a  simple  temporal  network  (Dechter  et  al.  1991)  allowing  for  efficient  temporal 
inference  during  planning. 

For  UTSL  goals  of  the  form  V<p  [<P  U^7'7''  \P] ,  we  instead  want  to  find  plans  representing  executions 
not  satisfying  the  path  formula  <h  U7-7^  \P.  We  then  use  -i\P  as  an  invariant  condition  that  must  hold  in 
the  interval  [r,  t'}  .  This  can  be  represented  by  a  durative  action  scheduled  to  staid  at  time  r  and  end  at 
time  t'  with  invariant  condition  -i\P  and  no  effect.  Note  that  it  is  not  necessary  to  achieve  ^(P  in  order  for 
$  UVrt  <P  to  be  false,  so  we  do  not  include  <P  in  the  deterministic  planning  problem.  This  means  that  an 
empty  plan  will  satisfy  the  goal  condition,  unless  r  is  zero  and  *P  holds  in  the  initial  state  in  which  case  the 
problem  lacks  solution.  We  therefore  return  the  null-policy  as  an  initial  policy  for  such  goals. 

There  are  a  few  additional  constraints  that  we  enforce  in  the  modified  version  of  VHPOP.  The  first  is 
that  we  do  not  allow  concurrent  actions.  This  is  due  to  the  restriction  on  policies  being  mappings  from 
states  to  single  actions.  The  restriction  is  not  severe,  however,  since  an  “action”  with  extended  delay  can 
be  modeled  as  a  controllable  event  with  short  delay  to  staid  the  action  and  an  exogenous  event  to  end  the 
action,  allowing  for  additional  actions  to  be  executed  before  the  temporally  extended  action  completes.  For 
example,  a  “drive”  action  with  extended  duration  can  be  represented  by  a  “staid”  action  and  an  “arrive” 
event.  The  second  constraint  is  that  separate  instances  of  the  same  exogenous  event  cannot  overlap  in  time. 
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For  example,  if  one  instance  of  the  “crash”  event  is  enabled  at  time  r  and  scheduled  to  trigger  at  time  t'  ,  then 
no  other  instances  of  “crash”  can  be  scheduled  to  be  enabled  or  trigger  in  the  interval  [r,  t']  .  This  constraint 
follows  from  the  GSMP  domain  model.  Both  constraints  arc  of  the  same  nature  and  arc  represented  in  the 
planner  as  a  new  flaw  type,  associated  with  two  events  ei  and  that  can  be  resolved  in  ways  analogous  to 
promotion  and  demotion  for  regular  POCL  threat  resolution:  either  the  end  of  ei  must  come  before  the  start 
of  e2,  or  the  start  of  ei  must  come  after  the  end  of  . 

The  state  of  a  GSMP  can  change  only  at  the  triggering  of  an  event.  At  this  point,  other  events  can 
be  enabled.  It  is  not  possible,  however,  that  an  event  becomes  enabled  between  state  transitions.  A  plan 
is  adjusted,  before  it  is  returned  by  Generate-Initial-Policy,  to  ensure  that  events  are  scheduled  to 
become  enabled  at  the  triggering  of  some  other  event,  and  not  at  an  arbitrary  point  in  time.  A  plan  now 
represents  an  execution  of  actions  and  exogenous  events  satisfying  the  path  formula  <b  U  T~T'  T,  possibly 
ignoring  the  adverse  effects  of  other  exogenous  events,  which  is  left  for  the  debugging  phase  to  discover. 

8.2.2  From  Plan  to  Policy 

A  plan  returned  by  VHPOP  is  a  set  of  triples  (ti,ei,di),  where  e,  is  an  event,  t,  is  the  time  that  et  is 
scheduled  to  become  enabled,  and  di  is  the  delay  of  et  (i.e.  et  is  scheduled  to  trigger  at  time  t-t  +  di).  Given 
a  plan,  we  now  want  to  generate  a  policy.  We  represent  a  policy  using  a  decision  tree  (cf.  Boutilier  et  al. 
1995),  and  generate  it  by  converting  a  plan  into  a  set  of  training  examples  composed  of  state-action  pairs 
(si,  e^,  Si  £  S  and  e,.  £  ,4  IJ  { ae } ,  and  then  generating  a  decision  tree  from  these  training  examples.  The 
training  examples  arc  obtained  by  serializing  the  plan  returned  by  VHPOP  and  executing  the  sequence  of 
events,  stalling  in  the  initial  state.  A  decision  tree  policy  can  be  compiled  into  a  set  of  test-action  pairs,  the 
policy  representation  used  by  CIRCA  (Musliner  et  al.  1995),  to  facilitate  efficient  and  predictable  execution 
behavior. 

We  serialize  a  plan  by  sorting  the  events  in  ascending  order  based  on  their  trigger  time,  breaking  ties 
nondeterministically.  The  first  event  to  trigger,  call  it  eo,  is  applied  to  the  initial  state  sq>  resulting  in  a  state 
si.  If  eo  is  an  action,  then  this  gives  rise  to  a  training  example  (sq.  eo).  Otherwise,  the  first  event  gives 
rise  to  the  training  example  (so,ae),  signifying  that  we  arc  waiting  for  something  beyond  our  control  to 
happen  in  state  sq.  We  continue  to  generate  training  examples  in  this  fashion  until  there  arc  no  unprocessed 
events  left  in  the  plan.  Given  a  set  of  training  examples  for  the  initial  plan,  we  use  regular  decision  tree 
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induction  (Quinlan  1986)  to  generate  an  initial  policy.  The  policy  will  assign  actions  even  to  states  that  arc 
not  included  in  the  training  set.  It  is  left  to  the  debugging  phase  to  identify  overgeneralization. 

To  illustrate  the  process  of  generating  an  initial  policy,  consider  the  planning  problem  described  by 
Younes  et  al.  (2003),  which  is  a  continuous-time  variation  of  a  problem  developed  by  Blythe  (1994).  In 
this  problem,  the  goal  is  to  have  a  person  transport  a  package  from  CMU  in  Pittsburgh  to  Honeywell  in 
Minneapolis  in  at  most  300  time  units  with  probability  at  least  0.9,  without  losing  it  on  the  way.  In  UTSL, 
this  goal  can  be  expressed  as  V>  0.9  [~^lostp  kg  Z^0,300!  aZ  me,  honey  well  A  carry  ingmepkg] .  The  package  can  be 
transported  between  the  two  cities  by  airplane  and  between  two  locations  within  the  same  city  by  taxi.  There 
is  one  taxi  in  each  city.  The  Pittsburgh  taxi  is  initially  at  CMU,  while  the  Minneapolis  taxi  is  at  the  airport. 
There  is  one  airplane  available,  and  it  is  initially  at  the  Pittsburgh  airport.  The  airplane  can  get  filled  if  we 
do  not  have  a  reservation,  preventing  us  to  hoard  it  when  arriving  at  the  Pittsburgh  airport.  A  reservation  can 
be  made  from  CMU.  Taxis  located  at  airports  serve  other  customers  periodically,  which  means  that  we  may 
have  to  wait  for  a  taxi  when  we  arrive  at  the  Minneapolis  airport.  If  we  stay  for  too  long  at  an  airport,  the 
package  can  get  lost,  although  this  can  be  prevented  by  putting  the  package  in  storage.  The  departure  of  the 
airplane  from  an  airport  is  controlled  by  an  exogenous  event,  which  means  that  we  can  miss  the  departure  if 
it  takes  too  long  to  get  to  the  airport. 

Figure  8.2(a)  shows  the  plan  generated  by  the  deterministic  temporal  planner.  The  plan  schedules  two 
events  to  become  enabled  at  time  zero,  one  being  the  action  to  enter  a  taxi  at  CMU,  and  the  other  being 
the  exogenous  event  causing  the  plane  to  depart  from  Pittsburgh  to  Minneapolis  (actions  arc  identified  by 
an  entry  in  the  second  column  of  the  table  in  Figure  8.2(a)).  The  “enter-taxi”  action  is  scheduled  to  trigger 
first,  resulting  in  a  training  example  mapping  the  initial  state  to  this  action.  The  next  state  is  mapped  to  the 
first  “depart-taxi”  action,  while  the  state  following  the  triggering  of  that  action  is  mapped  to  the  idle  action. 
This  is  because  the  next  event  (“arrive-taxi”)  is  not  an  action.  Eight  additional  training  examples  can  be 
extracted  from  the  plan,  and  the  decision  tree  representation  of  the  policy  learned  from  the  eleven  training 
examples  is  shown  in  Figure  8.2(b).  This  policy,  for  example,  maps  all  states  satisfying  aZpgh-taxi,cmu  A 
aUne.crmi  to  the  action  labeled  a\  (the  first  “enter-taxi”  action  in  the  plan),  while  states  where  a(pgh-taxi,cmu> 
®^piane,mpis-airport>  <md  ill me.pgii-airport  «ue  all  false  and  inme iPiane  is  tiue  aie  mapped  to  the  idle  action  cie. 

Additional  training  examples  can  be  obtained  from  plans  with  multiple  events  scheduled  to  trigger  at  the 
same  time  by  considering  different  trigger  orderings  of  the  simultaneous  events.  If  two  events  e\  and  e 2  arc 
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ti.Ci\dj\ 

0:(enter-taxi  me  pgh-taxi  cmu)[l] 

0:  (depart-plane  plane  pgh- airport  mpls-airport)[60] 

I  :(depart-taxi  me  pgh-taxi  emu  pgh-airport)[l] 
2:(arrive-taxi  pgh-taxi  emu  pgh- airport) [20] 
22:(leave-taxi  me  pgh-taxi  pgh-airport)[l] 

23: (check-in  me  plane  pgh-airport)[l] 

60:  (aiiive -plane  plane  pgh-airport  mpls-airport)[90] 
150:(enter-taxi  me  mpls-taxi  mpls-airport)[l] 
151:(depart-taxi  me  mpls-taxi  mpls-airport  honeywell)[l] 
152:(arrive-taxi  mpls-taxi  mpls-airport  honey  well)  [20] 
172:(leave-taxi  me  mpls-taxi  honeywell)[l] 


act. 

at 

02 

03 

04 

«5 

07 


(a)  Plan  for  simplified  deterministic  planning  problem. 


Ofme, pgh-airport 


aE  If  pgh-taxi, cmu,pgh-airport 


(b)  Policy  generated  from  plan  in  (a). 


Figure  8.2:  (a)  Initial  plan  and  (b)  policy  for  transportation  problem.  Leaves  in  the  decision  tree  are  labeled  by  actions, 
with  labels  taken  from  the  table  in  (a).  To  find  the  action  selected  by  the  policy  for  a  state  s,  start  at  the  root  of  the 
decision  tree.  Traverse  the  tree  until  a  leaf  node  is  reached  by  following  the  left  branch  of  a  decision  node  if  s  satisfies 
the  test  at  the  node  and  following  the  right  branch  otherwise. 
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Debug-Policy (M,  so,  <t>,  tt) 

S  <^=  set  of  states  occurring  in  a 
s  -4=  some  state  in  ,S' 

a  <=  some  action  in  {a  £  A  U  {ae}  \  s  |=  4>a}  \  {7r(s)} 

7 t1  -4=  7 r,  but  with  the  mapping  of  s  to  a 

return  tt' 

Algorithm  8.2:  Generic  nondeterministic  procedure  for  debugging  a  policy. 


both  scheduled  to  trigger  at  time  t,  we  would  get  one  set  of  training  example  by  applying  e\  before  e^,  and 
a  second  set  by  applying  e2  before  e\.  This  can  result  in  different  training  examples  if  one  of  the  events  is 
an  action. 


8.3  Policy  Debugging 

During  verification  of  a  policy  n  for  a  planning  problem  (AA,so,4>),  a  set  of  sample  trajectories  a  = 
{<7i,  •  •  •  .  an  }  is  generated  for  the  stochastic  process  A4  tt  with  initial  state  so-  If  the  policy  7r  does  not 
satisfy  the  goal  condition  cj>,  then  these  sample  trajectories  can  help  us  understand  the  “bugs”  of  7r  and 
provide  us  with  valuable  information  on  how  to  debug  the  policy. 

Let  a  denote  the  set  of  trajectories  over  which  ip  is  verified  not  to  hold.  This  set  of  sample  trajectories 
provides  information  on  how  a  policy  can  fail  to  satisfy  the  specified  goal  condition.  We  can  use  this 
information  to  guide  policy  debugging,  without  relying  on  model  specific  knowledge. 

To  debug  a  policy  for  goal  condition  V>o[<p],  we  must  lower  the  probability  measure  of  the  set  of 
trajectories  not  satisfying  <p.  Each  member  rr,  e  a  is  a  trajectory  prefix  { (sq,  to),  •  •  ■  (sfc,tfc)}  providing 
evidence  on  how  a  policy  can  fail  to  achieve  the  goal  condition.  We  could,  conceivably,  improve  a  policy 
by  modifying  it  so  that  the  sequence  of  states  appealing  along  a  sample  trajectory  crt  €  o ±  is  interrupted. 
Algorithm  8.2  shows  a  generic  procedure  for  debugging  a  policy  based  on  this  simple  principle.  A  state  is 
nondeterministically  selected  from  the  set  of  states  that  occur  along  some  failure  trajectory  and  an  alternative 
action  is  assigned  to  that  state,  resulting  in  a  modified  policy. 

The  sample  trajectories  can  help  us  focus  the  debug  effort  on  the  relevant  parts  of  the  state  space,  in 
particular  if  failure  occurs  early  along  a  trajectory.  There  is  little,  however,  to  guide  the  state  and  action 
choice  in  the  model  independent  approach.  We  next  present  model  dependent  techniques  for  analyzing 
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sample  trajectories  that  can  lead  to  a  more  efficient  implementation  of  the  Debug-Policy  procedure.  The 
result  of  the  analysis  is  a  set  of  ranked  failure  scenarios.  A  failure  scenario  can  be  fed  to  the  deterministic 
temporal  planner,  which  will  tty  to  generate  a  plan  that  takes  the  failure  scenario  into  account.  The  resulting 
plan,  if  one  exists,  can  be  used  to  debug  the  current  policy. 

8.3.1  Analysis  of  Sample  Trajectories 

Policy  verification  generates  a  set  of  trajectory  prefixes  a  =  {cr\ . . . . ,  an},  with  each  trajectory  prefix  being 
of  the  form 

'A  {  (,SiO)  tjo)  j  CiCb  •  •  •  >  {Si,ki  —  1 5  1  — 1  j  {^iki  1  ^iki)  }  • 

This  form  differs  slightly  from  our  previous  representation  of  sample  trajectories  in  that  it  includes  the  trig¬ 
gering  events.  Knowing  which  events  cause  state  transitions,  and  not  only  the  time  at  which  the  transitions 
occur,  is  essential  in  our  analysis.  The  goal  of  the  analysis  is  to  produce  a  set  of  failure  scenarios  that  sum¬ 
marizes  the  information  in  the  sample  trajectories.  A  failure  scenario  is  a  sequence  . . . ,  en@tn)  of 

events  and  trigger  times,  and  is  constructed  with  a  specific  event  e/.,  1  <  k  <  n,  in  mind.  A  failure  scenario 
for  ep-  is  meant  to  represent  an  average  trajectory  that  does  not  satisfy  the  goal  condition  while  including  a 
state  transition  caused  by  e^.  Each  failure  scenario  is  assigned  a  score,  with  a  lower  score  indicating  higher 
severity. 

We  staid  the  construction  of  failure  scenarios  by  computing  a  value,  relative  to  a  UTSL  goal  formula 
V>  o  [‘h  U'[T'T''i  VI;] ,  for  each  state  occurring  along  a  sample  trajectory.  The  value  of  a  state  is  between  —  1 
and  1,  and  signifies  the  closeness  to  success  or  failure,  ignoring  timing  information  and  counting  only  the 
number  of  transitions.  A  large  positive  value  indicates  closeness  to  success,  while  a  large  negative  value 
indicates  closeness  to  failure.  State  values  arc  computed  by  constructing  a  discrete-time  Markov  reward 
process  representing  an  abstract  view  of  the  sample  trajectories  (cf.  Riley  and  Veloso  2004).  The  state  space 
for  this  Markov  reward  process  is  the  set  of  states  that  occur  along  some  sample  trajectory.  The  transition 
probabilities  p(s';  s )  arc  defined  as  the  number  of  times  s'  is  immediately  followed  by  s  along  the  sample 
trajectories  divided  by  the  total  number  of  occurrences  of  s.  Let  ks  be  the  number  of  trajectory  prefixes 
that  end  in  state  s  and  satisfy  the  path  formula  <h  UT-r^  T,  and  let  ls  be  the  number  of  trajectory  prefixes 
that  end  in  s  and  do  not  satisfy  the  path  formula.  Then,  the  immediate  reward  associated  with  state  s  is 
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(ks  —  ls) / (ks  +  ls),  or  0  if  no  trajectory  ends  in  s.2  The  values  of  states  arc  computed  using  the  recurrence 

V(s)  =  7  ^2p(s'-,s)V(s')  , 

s'eS 

where  7  <  1  is  a  discount  factor.  The  discount  factor  permits  us  to  control  the  influence  a  success  or  failure 
has  on  the  value  of  states  at  some  distance  from  the  point  along  a  trajectory  at  which  success  or  failure  is 
determined  to  occur.  State  values  can  be  computed  iteratively,  with  the  initial  value  of  a  state  being  equal  its 
immediate  reward. 

The  next  step  is  to  assign  a  value  to  each  event  that  occurs  along  some  sample  trajectory.  Each  triple 
s  -4 s',  meaning  that  e  causes  a  transition  from  s  to  s',  is  given  the  value  V(s')  —  V(s),  which  can  be 
seen  as  the  value  contribution  of  e.  The  value  V  (e)  of  an  event  e  is  the  sum  of  the  values  of  all  triples  that 
e  is  part  of.  This  way,  an  event  that  occurs  often  but  early  on  the  path  to  failure  can  have  a  lower  value  than 
an  event  that  leads  directly  to  failure  but  only  rarely.  For  later  use,  the  mean  ji,e  and  standard  deviation  ae 
over  triples  involving  e  is  also  computed.  The  event  with  the  largest  negative  value  can  be  thought  of  as  the 
“bug”  contributing  the  most  to  failure,  and  we  want  to  plan  to  avoid  this  event  or  to  prevent  it  from  having 
negative  effects.  The  event,  by  itself,  may  not  be  sufficient  to  understand  why  failure  occurs.  A  failure 
scenario  provides  the  context  in  which  the  event  leads  to  failure. 

We  construct  a  failure  scenario  for  each  event  e  by  combining  the  information  from  all  failure  trajectories 
cj i  containing  a  triple  s  — °-+  s'  such  that  V (s')  —  V(s)  <  fif:  7-  cre.  The  reason  for  the  cutoff  is  to  not  include 
information  from  failure  trajectories  where  an  event  contributes  to  failure  significantly  less  than  on  average 
so  that  the  aggregate  information  is  representative  for  the  “bug”  being  considered.  For  example,  we  fail  to 
deliver  the  package  to  Honeywell  in  Minneapolis  if  the  airplane  is  filled  before  we  have  a  chance  to  hoard 
it.  However,  every  occurrence  of  a  “fill-plane”  event  along  a  failure  trajectory  does  not  represent  the  same 
“bug”.  If  the  airplane  is  filled  while  we  arc  on  our  way  to  the  Pittsburgh  airport,  but  we  also  arrive  at  the 
airport  after  the  airplane  has  departed,  the  “fill-plane”  event  would  be  less  responsible  for  failure  than  if  we 
had  arrived  at  the  airport  in  time  for  departure. 

A  failure  scenario  is  constructed  from  a  set  of  trajectories  by  averaging  the  trigger  times  of  events. 
Figure  8.3  gives  an  example  of  how  two  failure  trajectories  arc  combined  into  a  single  failure  scenario. 
Event  e\  occurs  twice  along  both  failure  trajectories  and  therefore  occurs  twice  in  the  failure  scenario.  The 
2For  a  goal  formula  V<  e  [yj],  the  immediate  rewards  are  negated. 
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Trajectory  1 

Trajectory  2 

Failure  Scenario 

e4  @  1.2 

e4  @  1.6 

e4  @  1.4 

e2  @  3.0 

e2  @  3.2 

e2  @  3.1 

e\  @  4.5 

e3  @  4.4 

e4  @  4.5 

e3  @  4.8 

e4  @  4.5 

e3  @  4.6 

e4  @  6.8 

es  @  6.4 

es  @  6.7 

e5  @  7.0 

- 

- 

Figure  8.3:  Example  of  failure  scenario  construction  from  two  failure  trajectories. 


et  @  ti 

(enter-taxi  me  pgh-taxi  emu)  @  0.909091 
(depart-taxi  me  pgh-taxi  emu  pgh-airport)  @  1.81818 
(fill -plane  plane  pgh-airport)  @  13.284 
(arrive-taxi  pgh-taxi  emu  pgh-airport)  @  30.0722 
(leave-taxi  me  pgh-taxi  pgh-airport)  @  30.9813 
(lose-package  me  pkg  pgh-airport)  @  44.0285 


Label 

at 
02 
e3 
e4 
a  5 
ee 


Figure  8.4:  Failure  scenario  for  the  policy  in  Figure  8.2(b)  associated  with  the  “fill-plane”  event. 


trigger  time  for  the  7th  occurrence  of  e\  in  the  failure  scenario  is  the  average  of  the  trigger  times  of  the  7th 
occurrences  of  e\  in  the  two  trajectories.  Event  e4  only  appears  along  the  first  trajectory  and  is  thus  excluded 
from  the  scenario  (it  is  assumed  that  e4  has  trigger  time  oo  in  the  second  trajectory,  which  makes  the  average 
trigger  time  oo  as  well).  Figure  8.4  shows  an  actual  failure  scenario  for  the  transportation  problem. 


8.3.2  Planning  with  Failure  Scenarios 

We  select  the  failure  scenario  for  the  event  with  the  lowest  value  and  tty  to  generate  a  plan  for  the  selected 
scenario  that  achieves  the  goal.  If  this  fails,  we  tty  planning  for  the  next  worst  failure  scenario,  and  continue 
in  this  manner  until  we  find  a  promising  repair,  or  run  out  of  failure  scenarios. 

We  plan  to  neutralize  a  failure  scenario  by  incorporating  the  events  and  timing  information  of  the  sce¬ 
nario  into  the  planning  problem  that  is  then  passed  to  the  temporal  deterministic  planner.  Given  a  failure 
scenario  (e4@fi, . . . ,  ek@tk ,  •  •  • ,  en@tn)  associated  with  the  event  e&,  we  generate  a  sequence  of  states 
so,  ■■■  ,sn,  where  sq  is  the  initial  state  of  the  original  planning  problem  and  s,  for  7  >  0  is  the  state  obtained 
by  applying  e*  to  state  Sj_4.  We  can  plan  to  avoid  the  bad  event  e/,.  by  generating  a  planning  problem  with 
initial  state  Si  for  i  <  k.  By  choosing  i  closer  to  k,  we  can  potentially  avoid  planning  for  situations  that  the 
current  policy  already  handles  well.  By  choosing  i  closer  to  0,  we  allow  the  planner  more  time  to  neutralize 
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efc.  Our  implementation  iterates  over  the  possible  start  states  from  i  =  k  —  1  to  i  =  0.  If  a  solution  is  found 
for  some  i,  then  we  do  not  investigate  other  possible  initial  states.  For  each  planning  problem  generated, 
we  limit  the  number  of  search  nodes  explored  by  VHPOP.  This  is  necessary  because  VHPOP  takes  too 
long  time  to  recognize  that  a  problem  lacks  solution,  but  often  finds  a  solution  quickly  if  one  exists.  In  case 
the  search  limit  is  reached,  we  attempt  to  plan  given  an  earlier  initial  state,  or  tty  to  plan  for  the  next  worst 
failure  scenario  if  we  already  arc  at  i  =  0. 

Given  an  initial  state  s^,  the  events  following  sr  in  the  failure  scenario  are  incorporated  into  the  planning 
problem  in  the  form  of  a  set  of  event  dependency  trees  %  and  a  set  of  untriggered  events  IT%.  The  purpose 
of  these  two  sets  is  to  force  the  deterministic  planner  to  schedule  events  in  a  way  consistent  with  the  failure 
scenario.  Each  node  in  an  event  dependency  tree  stores  an  event  and  a  trigger  time  for  the  event  relative  to 
the  parent  node  (or  relative  to  the  initial  state  for  root  nodes).  The  children  of  a  node  for  an  event  e  represent 
events  that  depend  on  the  triggering  of  e  to  become  enabled.  If  the  deterministic  planner  schedules  the  event 
e,  then  the  events  that  depend  on  e  should  be  scheduled  to  follow  e.  The  set  IT%  represents  events  that  arc 
enabled  in  all  states  s3  but  differ  from  all  events  e3  for  j  >  i,  and  these  events  should  not  be  allowed  to 
trigger  between  time  0  and  tn  in  the  deterministic  planning  problem. 

We  define  the  sets  Tz  and  Ur  for  state  s,  recursively.  The  base  case  is  Tn  =  0,  with  Un  containing 
all  events  enabled  in  sn  (a  failure  scenario  imposes  no  scheduling  constraints  after  the  last  event  of  the 
scenario).  For  i  <  n,  let  6  =  tl+\  —  T  (or  simply  t\  for  i  =  0)  and  construct  a  tree  T)  consisting  of  a  single 
node  with  event  el+  \  and  trigger  time  5.  For  each  tree  T  E  %+p. 

•  if  the  event  at  the  root  of  T  is  an  action,  then  add  T  to  %  (there  is  no  reason  to  force  an  action  to 
follow  the  triggering  of  an  event,  because  actions  arc  truly  under  the  control  of  the  planner). 

•  if  the  event  at  the  root  of  T  is  enabled  in  st,  then  add  6  to  the  trigger  time  of  the  root  node  and  add  the 
resulting  tree  to  %. 

•  if  the  event  at  the  root  of  T  is  disabled  in  st,  then  add  T  to  the  children  of  7)  (if  the  root  event  of  T  is 
disabled  in  S{,  then  it  is  enabled  by  el+  \  according  to  the  failure  scenario). 

Fet  U  be  the  set  of  events  e  E  Ui+\  not  enabled  in  st.  Then  U,  =  Ui+ 1  \  (U  U  {e,+  i  }) .  Finally,  add  7)  to  7). 

For  the  scenario  shown  in  Figure  8.4  and  the  state  right  before  the  “fill -plane”  event  (i  =  2),  there  are 
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three  event  trees:  one  with  e4@28.254  as  the  sole  node,  one  with  e3@ll.4658  as  the  sole  node,  and  a  final 
tree  with  05  at  the  root  and  eg@13.0472  as  a  child  node.  The  set  U2  contains  the  following  two  events: 

(depart-plane  plane  pgh-airport  mpls-airport) 

(move-taxi  mpls-taxi  mpls-airport) 

This  means  that  if  we  start  planning  from  state  .s'2,  we  arc  not  allowed  to  schedule  either  of  these  two  events 
until  after  the  trigger  time  for  the  last  event  in  the  failure  scenario. 

We  incorporate  the  event  trees  in  %  that  have  an  exogenous  event  at  the  root  into  the  deterministic 
planning  problem  by  forcing  all  the  events  in  these  trees  to  be  paid  of  the  plan.  Events  at  root  nodes  arc 
scheduled  to  become  enabled  at  time  0  and  to  trigger  at  the  time  stored  at  the  node,  and  events  at  non-root 
nodes  arc  scheduled  to  become  enabled  at  the  time  the  parent  event  triggers  and  scheduled  to  trigger  t  time 
units  after  the  parent  event  triggers  (t  being  the  time  stored  at  the  node).  The  deterministic  planner  is  allowed 
to  disable  the  effects  of  a  forced  event  by  disabling  its  enabling  condition.  This  can  easily  be  handled  in  a 
POCL  framework  by  treating  the  enabling  condition  as  an  effect  condition  that  can  be  disabled  by  means  of 
confrontation  (Weld  1994).  The  sets  U,  impose  further  scheduling  constraints  for  the  deterministic  planner. 

Once  a  plan  is  found  for  a  failure  scenario,  we  extract  a  set  of  training  examples  from  the  plan  as 
described  in  Section  8.2.  We  update  the  current  policy  by  incorporating  the  additional  training  examples 
into  the  decision  tree  using  incremental  decision  tree  induction  (Utgoff  et  al.  1997).  This  requires  that  we 
store  the  old  training  examples  in  the  leaf  nodes  of  the  decision  tree,  and  some  additional  information  in 
the  decision  nodes,  but  we  avoid  having  to  generate  the  entire  decision  tree  from  scratch.  We  adapt  the 
algorithm  of  Utgoff  et  al.  to  our  particular  situation  by  always  giving  precedence  to  new  training  examples 
over  old  ones  in  case  of  inconsistencies,  and  by  restructuring  the  decision  tree  only  after  incorporating  all 
new  training  examples  (the  latter  is  done  for  efficiency  and  does  not  change  the  outcome). 

Figure  8.5(a)  shows  a  plan  for  the  failure  scenario  in  Figure  8.4,  with  the  state  after  the  “enter-taxi” 
action  as  the  initial  state  for  the  planning  problem.  Note,  in  particular,  the  “fill-plane”  event,  which  the 
deterministic  planner  has  been  forced  to  schedule  at  time  12.3749.  The  planner  uses  the  “make -reservation” 
action  to  counter  the  adverse  effects  of  the  “fill-plane”  event.  The  policy  after  incorporating  the  training 
examples  generated  from  the  plan  is  shown  in  Figure  8.5(b).  The  entire  right  subtree  for  the  repaired  policy 
is  the  same  as  for  the  initial  policy,  so  it  does  not  have  to  be  regenerated. 
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ti.ei  [Vi;  ] 

0: (leave-taxi  me  pgh-taxi  cmu)[l] 

0:(depart-plane  plane  pgh-airport  mpls-airport)[60] 
0:(flll-plane  plane  pgh-airport)[12.3749] 

1: (make-reservation  me  plane  cmu)[l] 

2:(enter-taxi  me  pgh-taxi  cmu)[l] 

3: (depart-taxi  me  pgh-taxi  emu  pgh-airport)[l] 
4:(arrive-taxi  pgh-taxi  emu  pgh-airport) [20] 
24:(leave-taxi  me  pgh-taxi  pgh-airport)[l] 

25:(check-in  me  plane  pgh-airport)[l] 

60:(arrive-plane  plane  pgh-airport  mpls-airport)[90] 
150:(enter-taxi  me  mpls-taxi  mpls-airport)[l] 
151:(depart-taxi  me  mpls-taxi  mpls-airport  honeywell)[l] 
152:(arrive-taxi  mpls-taxi  mpls-airport  honey  well)  [20] 
172:(leave-taxi  me  mpls-taxi  honeywell)[l] 


act. 

as 


a9 

ai 

a9 

as 

<24 

<25 

<26 

<27 


(b)  Repaired  policy. 


(a)  Plan  for  failure  scenario. 


Figure  8.5:  (a)  Plan  for  failure  scenario  in  Figure  8.4  using  the  second  state  as  initial  state,  and  (b)  the  policy  after 
incorporating  the  training  examples  from  the  plan  in  (a).  If  the  taxi  is  at  CMU  but  we  are  not,  then  it  is  assumed  that 
we  are  in  the  taxi.  In  that  case,  we  leave  the  taxi  ( a% )  if  we  do  not  have  a  reservation.  The  right  subtree  of  the  root 
node  is  identical  to  that  of  the  initial  policy  in  Figure  8.2(b),  and  is  only  indicated  by  three  vertical  dots. 


8.4  Statistical  Policy  Comparison 

The  procedure  Better-Policy  is  supposed  to  compare  the  policies  7r  and  n',  returning  the  better  of  the 
two.  Given  a  UTSL  goal  condition  V>o[<p],  let  p  be  the  probability  measure  of  the  set  of  trajectories  that 
satisfy  p  for  model  M.  [7r]  and  let  p'  be  the  probability  measure  of  the  set  of  trajectories  that  satisfy  p  for 
model  ,A/l[7r'].  We  can  use  a  statistical  approach  to  implementing  Better-Policy  such  that  it  returns  7r 
with  high  probability  if  p  is  significantly  greater  than  p',  it'  with  high  probability  if  p  is  significantly  less 
than  p' ,  and  either  of  the  two  policies  with  roughly  equal  probability  if  p  is  close  to  p' . 

The  problem  of  comparing  two  policies  can  be  posed  as  a  hypothesis  testing  problem.  We  want  to  test 
the  hypothesis  H  :  p>p'  against  the  alternative  hypothesis  K  :  p  <  //.  Acceptance  of  H  should  result  in  us 
choosing  7r  over  7r',  while  acceptance  of  IC  would  lead  us  to  prefer  n' .  We  can  use  a  technique  described  by 
Wald  (1945,  pp.  165)  to  transform  this  into  a  hypothesis  testing  problem  that  can  be  solved  using  techniques 
described  in  previous  chapters.  The  basic  idea  is  to  pair  the  observations  made  for  the  two  model  checking 
problems.  Let  x\, . . . .  xrn  be  the  observations  obtained  by  verifying  p  over  sample  trajectories  for  A4  [7r]  and 
let  x[, ... ,  x'm,  be  the  observations  obtained  by  verifying  p  over  sample  trajectories  for  M[it'].  We  create 
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Better-Policy (M,  -s0,  4>,  tt,  tt') 

k  E=  min (| a: |,  \x'\)  l>  x  arc  observations  for  ir  and  x'  are  observations  for  tt' 

d  -4=  0,  n  A=  0 

for  i  1  to  k  do 

if  Xi  =  1  A  x\  =  0  then 
d  <=  d+  1,  n  ro  +  1 

else  if  Xj  =  0  A  x\  =  1  then 
n  A=  n  +  1 

if  2d  >  n  then 

return  tt  O  p- value  F(n  —  d\  n ,  0.5) 

else 

return  tt'  \>  p-value  F(d ;  n,  0.5) 

Algorithm  8.3:  Statistical  comparison  of  two  policies. 

pairs  (xi,x'j)  of  the  first  min(m,  m')  observations.  Each  pair  (1,0)  is  counted  as  an  observation  yr  =  1  of 
a  Bernoulli  variate  Y{  for  a  new  hypothesis  testing  problem,  and  a  pair  (0, 1)  is  counted  as  an  observation 
yi  =  0.  Pairs  with  matching  observations  are  discarded.  It  is  easy  to  verify  that  if  7r  and  n'  are  equally  good, 
then  Pr[y)  =  1]  =  0.5  (cf.  Wald  1945,  p.  166).  Let  p  =  Pr[Y)  =  1].  We  test  H  against  K  by  testing  the 
hypothesis  H  :  p>  0.5  against  the  alternative  hypothesis  K  :  p  <  0.5  using  the  observations  y,. 

For  efficiency,  we  can  reuse  the  observations  already  generated  by  Test-Policy.  This  gives  us  a 
predetermined  sample  of  size  n,  where  n  is  the  number  of  paired  observations  that  differ  in  value.  We  can 
use  the  same  approach  as  described  in  Chapter  7  for  “black-box”  probabilistic  verification  to  test  H  :  p  >  0.5 
against  K  :  p  <  0.5  using  a  predetermined  sample.  This  gives  us  a  p-value  for  the  decision  we  make.  With 
Y^i= i  Vi  =  d,  the  p- value  for  H  is  F(n  —  d:  n,  0.5),  while  the  p-value  for  K  is  F(d:  n,  0.5).  Because  the 
threshold  is  0.5,  the  lower  p-value  is  obtained  by  accepting  H  if  and  only  if  at  least  half  of  the  observations 
are  positive.  Algorithm  8.3  shows  code  for  implementing  the  procedure  Better-Policy  in  this  way. 


8.5  Formal  Properties  of  Planning  Algorithm 

When  describing  a  new  planning  algorithm,  it  is  common  to  consider  soundness  and  completeness  of  the 
algorithm.  A  planning  algorithm  is  sound  if  every  plan  that  it  generates  is  a  valid  solution  to  the  planning 
problem  it  is  given.  The  algorithm  is  complete  if  it  generates  a  plan  for  every  problem  that  has  a  solution. 
A  planning  algorithm  that  is  both  sound  and  complete  is  guaranteed  to  produce  a  valid  plan  whenever  a 
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solution  exists,  and  it  is  guaranteed  not  to  produce  a  plan  for  problems  that  lack  solutions. 

Our  proposed  planning  algorithm  is  sound,  so  long  as  Test-Policy  never  accepts  a  policy  that  does 
not  satisfy  the  goal  condition.  Since  we  rely  on  statistical  techniques,  our  planner  can  give  only  probabilistic 
guarantees  regarding  soundness.  For  a  given  policy  7 r,  our  statistical  model  checking  algorithm  guarantees 
that  Pr  [A4[7t],  so  P  </>  |  A4  [7t]  ,  so  ft]  <  ft-  This  means  that,  in  each  iteration  of  the  algorithm,  we  are 
guaranteed  that  a  policy  7r  is  accepted  with  probability  at  most  ft  if  tt  is  not  satisfactory  (i.e.  M.  [7r] ,  so  ft). 
Since  the  algorithm  halts  once  we  accept  a  policy,  we  get  an  overall  bound  of  ft  on  the  probability  that  Find- 
POLICY  returns  an  unsatisfactory  policy.  We  say  that  the  planning  algorithm  is  ft-sound. 

By  adopting  hill-climbing  for  policy  search,  we  sacrifice  completeness.  Even  with  exhaustive  search  of 
the  policy  space,  however,  we  may  still  not  be  able  to  guarantee  completeness.  This  is  because  the  statistical 
model  checking  algorithm  could  fail  to  identify  a  satisfactory  policy.  We  are  guaranteed  that  Pr  [A4[7r],  so  F 
ft  |  A4[7t],s0  ft]  >  1  —  a.  If  we  consider  each  policy  at  least  once,  and  there  arc  k  satisfactory 
policies,  then  the  probability  is  at  least  1  —  ak  that  some  policy  is  accepted  as  a  solution.  This  does  not 
mean  that  the  accepted  policy  is  satisfactory  (that  is  a  matter  of  soundness  rather  than  completeness).  We 
can  increase  the  probability  of  producing  a  policy  by  visiting  policies  multiple  times  during  the  search.  If, 
for  example,  we  could  guarantee  that  a  satisfactory  policy  was  visited  an  infinite  number  of  times,  then  the 
algorithm  would  produce  a  policy  with  probability  1,  which  in  the  limit  would  give  us  a  complete  algorithm, 
assuming  that  each  policy  verification  is  carried  out  independently.  Without  an  independence  assumption, 
we  could  guarantee  only  a  1  —  a  probability  of  accepting  some  policy  (cf.  Theorem  5.4).  For  instance,  the 
independence  assumption  would  be  violated  if  we  reused  sample  trajectories  for  the  verification  of  multiple 
policies.3  This  leads  to  a  (1  —  a)-complete  planning  algorithm. 

8.6  Experimental  Results 

The  results  in  this  section  were  generated  on  a  PC  with  a  650  MHz  Pentium  III  processor  running  Linux.  A 
search  limit  of  10,000  explored  nodes  was  set  for  the  deterministic  planner  VHPOP.  We  used  the  additive 
heuristic  described  by  Younes  and  Simmons  (2002a,  2003),  which  is  an  adaptation  for  POCL  planning  of 


3  Younes  and  Musliner  (2002)  describe  a  probabilistic  extension  of  CIRCA  where  policies  are  constructed  incrementally.  While 
reuse  of  sample  trajectories  is  not  mentioned  explicitly  by  Younes  and  Musliner,  it  is  present  in  the  implementation  of  their  approach. 
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Event 

Rank 

Value 

pe  +  cre 

Trajectories 

(fill-plane  plane  pgh-airport) 

1.0 

-24.1 

-0.36 

41.8 

first  policy 

(lose-package  me  pkg  mpls-airport) 

2.0 

-14.7 

-0.76 

15.0 

(lose -package  me  pkg  pgh-airport) 

3.2 

-6.8 

-0.15 

36.4 

(lose-package  me  pkg  mpls-airport) 

1.0 

-94.3 

-0.70 

101.6 

second  policy 

(arrive' -plane  plane  pgh-airport  mpls-airport) 

2.4 

-19.9 

0.04 

99.4 

(move-taxi  mpls-taxi  mpls-airport) 

2.6 

-18.2 

0.06 

107.4 

Table  8.2:  Top  ranking  “bugs”  for  the  first  two  policies  of  the  transportation  problem.  All  numbers  are  averages  over 
five  runs.  A  rank  of  1.0  means  that  a  “bug”  was  determined  to  be  the  worst  in  all  five  runs. 


the  additive  heuristic  for  state  space  planning  first  proposed  by  Bonet  et  al.  (1997). 

Consider  the  transportation  problem  described  earlier  in  this  chapter.  There  are  several  things  that  can 
go  wrong  with  the  initial  policy  in  Figure  8.2(b):  the  plane  can  become  full  or  depart  before  we  get  to  the 
Pittsburgh  airport  to  check  in,  the  Minneapolis  taxi  can  be  serving  other  customers  when  we  arrive  at  the 
Minneapolis  airport,  and  the  package  can  get  lost  if  we  stand  with  it  at  an  airport  for  too  long.  The  top 
part  of  Table  8.2  shows  the  worst  three  “bugs”  for  the  initial  policy  as  determined  by  the  sample  trajectory 
analysis.  The  numbers  in  the  table  arc  averages  over  five  runs  with  different  random  seeds,  and  we  used 
the  parameters  a  =  (3  =  0.01  (error  probability)  and  5  =  0.005  (half-width  of  indifference  region)  with 
the  verification  algorithm.  By  a  wide  margin,  the  worst  bug  is  that  the  plane  becomes  full  before  we  have 
a  chance  to  check  in.  Losing  the  package  at  Minneapolis  airport  comes  in  second  place.  Note  that  the 
package  is  more  often  lost  at  Pittsburgh  airport  than  at  Minneapolis  airport,  but  this  bug  is  not  ranked  as 
high  because  it  tends  to  happen  only  when  the  plane  already  has  been  filled.  The  value  of  the  state  where  the 
“lose -package”  event  at  Pittsburgh  airport  occurs  is  already  close  to  —1  due  to  an  earlier  “fill-plane”  event, 
resulting  in  a  mean  value  of  only  —0.15  for  the  “lose -package”  event  at  Pittsburgh  airport. 

The  “fill-plane”  bug  is  repaired  by  making  a  reservation  before  leaving  CMU,  resulting  in  the  policy 
shown  in  Figure  8.5(b).  The  top  three  bugs  for  this  policy  arc  shown  in  the  bottom  paid  of  Table  8.2.  Now, 
losing  the  package  at  Minneapolis  airport  appeal's  to  be  the  only  severe  bug  left.  Note  that  losing  the  package 
at  Pittsburgh  airport  no  longer  ranks  in  the  top  three  because  the  repair  for  the  “fill-plane”  bug  fortuitously 
took  care  of  this  bug  as  well.  The  package  is  lost  at  Minneapolis  airport  because  the  taxi  is  not  there  when 
we  arrive,  and  the  repair  found  by  the  planner  is  to  store  the  package  in  a  safety  box  until  the  taxi  returns. 
The  policy  resulting  from  this  repair  satisfies  the  goal  condition,  so  we  are  done. 


Table  8.3  shows  running  times  for  the  different  parts  of  the  planning  algorithm  on  two  variations  of  the 
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a  =  (3 

Verify 

first  policy 

Analyze 

Repair 

second  policy 

Verify  Analyze  Repair 

third  policy 

Verify 

10”1 

0.642 

0.232 

0.012 

0.014 

problem  1 

10~2 

0.640 

0.470 

0.012 

0.018 

10-4 

0.004 

0.646 

0.974 

0.020 

0.022 

ltr1 

0.072 

0.666 

2.372 

0.036 

problem  2 

10~2 

0.140 

0.670 

5.490 

0.074 

2.496 

1.318 

io-4 

0.272 

0.010 

0.682 

10.036 

0.128 

2.568 

2.494 

Table  8.3:  Running  times,  in  seconds,  for  different  stages  of  the  planning  algorithm  for  the  original  transportation 
problem  (problem  1)  and  the  modified  transportation  problem  (problem  2)  with  varying  error  bounds  ( a  and  /J).  All 
numbers  are  averages  over  five  runs. 


transportation  problem.  The  first  problem  uses  the  original  transportation  domain,  while  the  second  problem 
replaces  the  possibility  of  storing  a  package  with  an  action  for  reserving  a  taxi  and  uses  the  probability 
threshold  0.85  instead  of  0.9.  We  can  see  that  the  sample  trajectory  analysis  takes  very  little  time.  The  time 
for  the  first  repair  is  about  the  same  for  both  problems,  which  is  not  surprising  as  exactly  the  same  repair 
applies  in  both  situations.  The  second  repair  takes  longer  for  the  second  problem  because  we  have  to  go 
further  back  in  the  failure  scenario  in  order  to  find  a  state  where  we  can  apply  the  taxi  reservation  action 
so  that  it  has  desired  effects.  The  planner  tries  each  initial  state  for  a  failure  scenario  before  considering  a 
lower  ranked  scenario.  The  search  limit  determines  the  amount  of  effort  that  is  spent  on  finding  a  solution 
for  a  specific  initial  state  before  proceeding  with  the  next  alternative.  We  observe  that  the  sample  trajectory 
analysis  finds  the  same  major  bugs  despite  random  variation  in  the  sample  trajectories  across  runs  and 
varying  error  bounds.  Verification  takes  longer  for  the  second  policy  for  problem  2  because  the  policy  is 
close  to  satisfactory.  In  all  other  cases,  the  policy  is  either  clearly  satisfactory  or  clearly  unsatisfactory. 

There  is  no  guarantee  that  each  repair  step  takes  us  any  closer  to  a  solution.  We  currently  only  take 
the  most  recent  trajectories  into  account  in  the  failure  analysis,  which  makes  it  possible  to  reintroduce  a 
previous  bug  in  an  attempt  to  address  a  new  bug.  It  is  also  not  clear  when  to  give  up  on  a  failure  scenario, 
and  imposing  a  fixed  search  limit  per  attempt  appeal's  arbitrary.  We  believe  that  the  failure  analysis  could  be 
more  useful  as  an  aid  to  human  system  analysts  and  engineers  designing  stochastic  systems,  as  the  failure 
scenarios  represent  a  convenient  summary  of  a  large  number  of  trajectories. 


Chapter  9 


Decision  Theoretic  Planning 


In  decision  theoretic  planning,  rewards  arc  introduced  that  represent  positive  or  negative  value  to  a  decision 
maker,  who  has  to  decide  on  a  course  of  action  in  light  of  uncertainty.  For  example,  there  is  a  small  chance 
that  we  win  $1,000,000  on  the  lottery,  but  each  ticket  costs  $1.  The  objective  for  the  decision  maker  is, 
roughly  speaking,  to  maximize  expected  reward. 

We  introduce  the  generalized  semi-Markov  decision  process  (GSMDP),  based  on  the  GSMP  model  of 
discrete  event  systems,  as  a  model  for  decision  theoretic  planning  with  asynchronous  events  and  actions. 
To  solve  a  GSMDP,  we  present  an  approximation  technique  that  transforms  an  arbitrary  GSMDP  into  a 
continuous-time  Markov  decision  process  (MDP).  Each  non-exponential  delay  distribution  in  the  GSMDP 
is  approximated  by  a  continuous  phase-type  distribution  (see  Section  2.1.3).  The  resulting  continuous-time 
MDP  can  then  be  solved  using  standard  solution  techniques  such  as  value  iteration.  We  demonstrate  our 
approximation  technique  on  models  of  different  size  and  complexity  and  we  show  that  the  introduction  of 
phases  indirectly  allows  us  to  take  into  account  the  time  spent  in  a  state  when  selecting  actions,  which  can 
lead  to  policies  with  higher  expected  reward  than  if  we  make  selections  based  only  on  the  current  state. 

9.1  Generalized  Semi-Markov  Decision  Processes 

The  generalized  semi-Markov  process  (GSMP),  described  in  Section  2.3.3,  is  an  established  formalism 
in  queuing  theory  for  modeling  continuous-time  stochastic  discrete  event  systems.  We  add  a  decision  di¬ 
mension  to  the  formalism  by  distinguishing  a  subset  of  the  events  as  controllable  and  introducing  rewards, 
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thereby  obtaining  the  generalized  semi -Markov  decision  process  (GSMDP).  We  limit  our  attention  to  time 
homogeneous  models  with  finite  state  and  event  sets.  For  simplicity,  we  assume  that  event  trigger  time 
distributions  arc  state  independent. 

9.1.1  Actions,  Policies,  and  Rewards 

As  in  Chapter  8,  we  associate  an  enabling  condition  <j)e  with  each  event  e  and  identify  a  set  ,4  C  E  of 
controllable  events,  or  actions.  The  remaining  events  E\A  are  referred  to  as  exogenous  events.  An  arbitrary 
event  e,  which  can  be  either  an  action  or  an  exogenous  event,  is  disabled  in  any  state  s  such  that  the  enabling 
condition  <fie  does  not  hold  in  s.  An  exogenous  event  e  is  always  enabled  in  a  state  s  if  the  event’s  enabling 
condition  cj)e  holds  in  s.  For  an  action  a,  on  the  other  hand,  satisfaction  of  the  enabling  condition  is  only  a 
necessary  condition  for  a  to  be  enabled,  but  a  can  be  kept  disabled  in  s  even  if  0„  holds.  A  decision  maker, 
or  agent,  can  influence  system  behavior  during  execution  by  enabling  and  disabling  actions  at  will. 

A  control  policy,  denoted  ir,  determines  which  action  or  set  of  actions  should  be  enabled  in  any  given 
situation  during  execution.  We  allow  the  action  choice  to  depend  on  the  current  state  of  the  process,  as  well 
as  its  entire  execution  history.  The  execution  history  can  be  captured  by  a  vector  u,  with  an  element  ue  for 
each  event  e  recording  the  time  that  e  has  remained  enabled  without  triggering.  The  situation  space  for  a 
process  with  state  space  S  and  event  set  E  is  therefore  the  set  O  =  S  x  [0,  oo)lEl.  A  policy  is  a  mapping 
from  situations  to  sets  of  actions:  n  :  O  — »  2A.  In  situation  o  =  (s,u),  events  Ep  =  Es\  (A  \  7r(o)) 
arc  enabled,  i.e.  actions  not  in  7 r(o)  arc  disabled.  The  choice  7 r(o)  =  0  represents  idleness.  Note  that  the 
current  situation  changes  continuously  as  time  progresses,  which  means  that  the  action  choice  could  change 
continuously  as  well.  In  practice,  it  can  be  useful  to  restrict  the  size  of  the  action  sets  that  a  policy  can  keep 
enabled.  For  example,  in  a  single  agent  system,  we  would  typically  allow  at  most  one  action  to  be  enabled 
at  any  time. 

While  in  theory  it  could  be  beneficial  to  change  the  action  choice  continuously  in  certain  cases,  it  can 
hardly  be  considered  practically  feasible  to  do  so.  We  will  limit  out  attention  to  piecewise  constant  policies, 
where  the  action  choice  is  required  to  remain  constant  for  a  duration  of  time  before  it  can  be  changed.  We 
can  represent  such  a  policy  with  a  mapping  r  from  situations  to  positive  distribution  functions,  in  addition 
to  the  mapping  7r.  At  the  triggering  of  an  event,  we  find  ourselves  in  situation  o.  We  enable  actions  7 r(o)  at 
this  point,  and  keep  this  choice  for  a  duration  of  time  governed  by  r(o)  if  no  event  triggers  first.  The  pair 
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(7 r,  r)  represents  a  piecewise  constant  policy.  In  some  situations,  as  we  will  see,  it  is  sufficient  to  consider 
stationary  policies,  where  the  action  choice  is  permitted  to  depend  only  on  the  current  state  and  not  in  any 
other  way  on  the  execution  history  of  the  process. 

In  addition  to  actions,  we  specify  a  reward  structure  to  obtain  a  GSMDP.  We  assume  a  traditional  reward 
structure  with  a  lump  sum  reward  ke(s,  s')  associated  with  the  transition  from  state  s  to  s'  caused  by  the 
triggering  of  event  e,  and  a  continuous  rew  ard  rate  ca'  (s)  associated  with  the  set  of  actions  A'  C  A  being 
enabled  in  s. 

Example  9.1.  Consider  a  network  of  two  computers  that  each  can  be  either  up  or  down.  With  each  computer 
we  associate  a  crash  event  <7,  enabled  when  computer  i  is  up,  and  a  reboot  action  r *,  enabled  when  computer 
i  is  down.  The  decision  maker  plays  the  role  of  a  system  administrator  in  this  example.  We  can  associate 
an  action  independent  reward  rate  of  c  £  {0, 1,2}  with  states  where  c  machines  are  up.  A  reasonable 
policy  for  this  GSMDP  would  be  to  enable  reboot  action  7  whenever  machine  i  is  down.  If  we  can  reboot 
only  one  machine  at  a  time,  due  to  resource  constraints,  we  could  choose  to  reboot  a  machine  as  soon  as  it 
crashes.  This  is  reasonable  if  the  reboot  time  distribution  for  each  computer  is  memoryless.  If  reboot  time 
distributions  arc  not  memoryless  and  one  machine  crashes  while  we  arc  rebooting  another  machine,  then  it 
may  be  better  to  complete  the  current  reboot  action  before  switching  to  reboot  the  machine  that  just  crashed. 

9.1.2  Optimality  Criteria 

We  will  now  derive  the  “Bellman  equation”  for  GSMDPs  with  piecewise  constant  policies.  The  general  case 
leads  to  a  recurrence  that  we  do  not  expect  can  be  solved  exactly.  If  all  delay  distributions  arc  exponential, 
however,  a  GSMDP  is  simply  a  continuous-time  MDP  We  show  how  the  recurrence  equation  for  such 
models  can  be  solved  using  value  iteration.  This  result  is  relevant  for  Section  9.2,  where  we  present  a 
technique  for  approximating  a  GSMDP  that  has  general  delay  distributions  with  one  where  all  delays  arc 
exponentially  distributed. 

We  consider  two  optimality  criteria — expected  finite-horizon  total  reward  and  expected  infinite-horizon 
discounted  reward — both  of  which  can  be  represented  by  a  universally  enabled  event  that  terminates  execu¬ 
tion  in  the  GSMDP  framework.  A  finite  planning  horizon  can  be  represented  by  an  event  with  a  deterministic 
distribution.  In  the  infinite-horizon  case,  reward  earned  t  time  units  into  the  future  is  discounted  by  a  factor 
7*.  This  is  equivalent  to  having  a  termination  event  with  delay  distribution  Exp  (a),  such  that  7  =  e~a 
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(Howard  1960,  p.  1 14).  We  thus  represent  termination  by  an  event  e±  that  always  leads  to  an  absorbing  state 
s  .  No  reward  is  earned  after  termination,  so  (^/(sjJ  is  zero  f°r  all  action  sets. 

To  express  the  expected  future  reward  for  a  situation  o  =  ( s,u ),  given  a  fixed  piecewise  constant  policy 
(7 r,  r),  we  consider  all  possible  schedules  (assignments  of  trigger  times)  of  enabled  events  that  are  consistent 
with  the  situation  at  hand.  To  the  enabled  events  in  situation  o  under  policy  (7 r,  r),  denoted  E^'T\  we  count 
e±  of  course,  but  also  another  virtual  event  er  with  delay  distribution  r(o).  The  event  eT  represents  the  point 
in  time  when  a  change  of  action  choice  is  scheduled  by  the  piecewise  constant  policy  (7 r,  r)  without  a  state 
transition  occuring.  A  schedule  for  the  events  is  a  vector  t  of  size  \E\  +  2.  We  can  define  a  probability 
density  function  over  possible  schedules  as  follows: 


(9.1) 


f('7T’T\t](s,u))=  he(te;Ue)-  6(te-00) 

ee  E?'t)  eeE\E^’T) 


Here,  5(t  —  to)  is  the  Dirac  delta  function  (Dirac  1927,  p.  625)  with  the  property  that  [  S(t  —  to)  dt  is  0 
for  x  <  to  and  1  for  x  >  to-  In  particular,  f  '  x  5(t  —  00)  dt  is  0  for  any  finite  x  and  I  for  x  =  00.  We  use  it 
in  (9.1)  to  assign  zero  weight  to  schedules  with  a  finite  trigger  time  for  disabled  events.  Let  t*  =  mini  and 
let  e*  =  arg  mini.  The  expected  future  reward  for  a  non-terminal  situation  o  =  (s,  u)  can  now  be  defined 
using  the  recurrence 


(9.2) 


n<7r’r)(o) 


t* 

=  j  f  c^o)(s)  dt+  ^2pe*(s';s)(ke*(s,s')  +  v{n'r) {0(o,t,  s')))  d/<7r’r)  (F;  o) 

[o,i)l*l  0  ^ 


^c7r(o)  (s)  +  K » (s)  +  ^  pe*  (s';  s)n(7r’r)  (0(o,  t,  s'))  df  <7r’T>  (f;  o) 


[0,oo)l4 


'eS 


where  ke(s)  =  Yls'&S Pe(s''i  s)ke(s' ,  s)  is  the  expected  transition  reward  in  s  when  a  transition  is  caused  by 
e,  and  O  is  a  function  providing  the  next  situation.  The  next  situation  is  (s',  u'),  with  u!e  increased  by  t*  if  e 
remains  enabled  without  triggering  and  otherwise  reset  to  zero.  Equation  9.2  is  the  “Bellman  equation”  for 
GSMDPs  with  piecewise  constant  policies. 

Equation  9.2  involves  a  high-dimensional  probability  integral,  which  suggests  that  finding  an  optimal 
piecewise  linear  policy  for  a  GSMDP  may  be  hard  in  the  general  case.  If  all  delay  distributions  arc  ex¬ 
ponential,  however,  then  the  GSMDP  is  just  a  continuous-time  MDP,  and  the  recurrence  becomes  more 
manageable.  We  call  this  a  Markovian  GSMDP  to  stress  the  event  structure.  The  requirement  on  all  delay 
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distributions  to  be  exponential  rules  out  the  finite-horizon  criterion,  which  requires  a  deterministic  distribu¬ 
tion,  but  we  can  handle  the  infinite-horizon  discounted  criterion. 

The  exponential  distribution  is  memoryless,  and  this  means  that  he(t ;  ue)  =  he(t)  for  an  event  e  with  an 
exponential  delay  distribution.  As  a  consequence,  it  is  of  no  value  to  know  for  how  long  events  have  been 
enabled,  as  the  future  behavior  of  the  system  depends  only  on  the  current  state  s.  A  policy  for  a  Markovian 
GSMDP  can  be  a  mapping  from  states  to  sets  of  actions,  and  there  is  no  need  for  a  T-component  since  the 
relevant  situation  does  not  change  as  time  progresses  in  a  state.  This  means  that  we  need  to  consider  only 
the  class  of  stationary  policies  in  order  to  act  optimally. 

In  state  s,  with  actions  A'  chosen  to  be  enabled,  the  events  ES(A')  =  Es\  (A  \  A')  arc  enabled,  not 
counting  the  termination  event  e±  which  is  enabled  in  all  states.  Let  Ae  denote  the  rate  of  the  exponential 
delay  distribution  associated  with  event  e,  and  let  a  be  the  rate  of  the  termination  event.  The  time  we  spend 
in  state  s  before  an  event  triggers  is  exponentially  distributed  with  rate 


•\a'(s)  —  cr  +  ^2  ■ 

ee  ES(A') 

The  probability  that  event  e  triggers  first  is  Xe/X^i(s),  and  the  probability  that  termination  occurs  before  any 
event  has  time  to  trigger  is  a/XA>{s).  These  conditions  are  easily  derived  for  the  exponential  distribution, 
permitting  us  to  write  the  recurrence  for  a  Markovian  GSMDP,  defining  the  expected  future  reward  of  a  state 
s  under  a  given  policy  7 r,  as  follows: 


OO  / 

r(s)  =  J  Al(s)(s)eAw(s)tjtc,r(s)(s)  + 

n  \ 


A. 


£ 

s'eS 


Pe(s';  s )  (ke(s,  s')  +  j  dt 


=  \  \,8S  f*V(8)(«)  +  Y 

X<sAS>  \  e£Es(n(s))  s'GS  J 

In  the  above  equation,  f^(s)  denotes  the  quantity  ca>(s)  +  Yhe&Es(A')  Xe  J2s'eS Pe(s''i  s)ke(s,  s'),  which 

essentially  is  the  expected  reward  per  time  unit  in  state  s  until  the  next  state  is  reached.  We  can  swap  the 

order  of  the  two  summations  to  obtain 


v*(s) 


\  l<  \  tr*(s)(s)  +  Y  Y  AePe(s'l  «)«"(«')  ) 

x(s>l'Sj  V  s'eSeeEs(n(s))  J 

'  1 ,  \  (  rw(s)(s)  +  Y  s)vn(s')  )  , 

Ms)(s)  V  ,,g5.  J 


(9.3) 
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where  wA>  =  Eee Ea(A')  KPe(s';  s). 

The  maximum  expected  reward  is  obtained  by  choosing  the  set  of  actions  that  maximizes  the  reward  in 
the  current  state  and  act  optimally  in  subsequent  states.  We  can  express  this  with  the  recurrence 

(9.4)  v*  (. s )  =  max  1  (  rA>  (s)  +  ^  wA>  (s';  s)v*  (s')  )  , 

A>cA\A'(s)y  ^  J 

derived  from  (9.3).  Equation  9.4  forms  the  basis  for  value  iteration  for  Markovian  GSMDPs.  Note  the 
striking  resemblance  with  (2.24)  for  discrete-time  MDPs.  Remember  that  the  discount  factor,  7  =  e~a,  is 
present  in  A^s).  We  can  also  write  (9.4)  using  matrix  notation: 

(9.5)  1/*  =  max  H#  o  (  %  +  WjV*  ) 

A’<zA\s\  \  J 

The  operator  o  represents  Hadamard  product  (element-wise  matrix  multiplication).  This  form  is  conve¬ 
nient,  for  example,  when  implementing  value  iteration  using  MTBDDs.  The  row  vector  II  ^  represents  the 
expected  holding  time  in  each  state. 

9.2  Approximate  Solution  Technique 

The  previous  section  provided  a  dynamic  programming  formulation  of  optimal  GSMDP  planning.  We  noted 
that  the  general  case  involves  a  high-dimensional  probability  integral,  which  limits  the  practical  use  of  the 
formulation.  If  all  delay  distributions  arc  exponential,  however,  a  GSMDP  is  simply  a  continuous-time  MDP, 
and  the  dynamic  programming  formulation  becomes  manageable  as  shown  in  (9.4).  We  now  take  advantage 
of  this  fact,  presenting  an  approximate  solution  technique  for  GSMDPs  that  uses  phase-type  distributions. 

To  find  a  policy  for  a  GSMDP,  we  first  approximate  it  with  a  continuous-time  MDP  by  approximating 
each  non-exponential  delay  distribution  with  a  phase-type  distribution.  Recall  that  phase-type  distributions 
(Section  2.1.3)  represent  the  time  from  entry  until  absorption  in  a  Markov  process  with  n  transient  states 
(phases)  and  a  single  absorbing  state.  The  continuous-time  MDP  can  be  solved  exactly,  for  example  by 
using  value  iteration.  We  can  also  use  uniformization  to  obtain  a  discrete-time  MDP,  in  case  we  want  to  use 
an  existing  solver  for  discrete-time  models.  The  resulting  policy,  in  either  case,  may  be  phase-dependent. 
Phase  transitions  do  not  occur  in  the  actual  model,  so  in  order  to  execute  the  policy  in  the  real  world,  we 
simulate  phase  transitions.  Our  solution  method  is  summarized  in  Figure  9.1. 
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Figure  9.1:  Schematic  view  of  solution  technique  for  GSMDPs. 


9.2.1  From  GSMDP  to  MDP 

We  first  present  our  method  for  approximating  a  GSMDP  with  an  MDP.  We  have  noted  that  if  all  events 
of  a  GSMDP  have  exponential  delay  distributions,  then  the  GSMDP  is  simply  a  continuous-time  MDP 
with  a  factored  transition  model.  By  using  phase-type  distributions,  we  can  replace  each  non-Markovian 
event  in  the  GSMDP  with  one  or  more  Markovian  events,  thereby  obtaining  a  continuous-time  MDP  that 
approximates  the  original  GSMDP. 

A  GSMDP  event  is  represented  by  a  triple  (<f)e,  Ge,pe ),  and  we  assume  a  factored  representation  of  the 
state  space  with  state  variables  V.  We  also  assume  that  p,  is  implicitly  represented  by  an  effect  formula  effe, 
using  the  effect  formalism  described  in  Section  8.2. 1  with  the  addition  of  numeric  state  variables. 

For  each  non-Markovian  event  e  with  delay  distribution  Ge,  we  find  a  phase-type  distribution  of  order 
ne  approximating  Ge.  We  add  a  phase  variable  phe  to  V  for  each  event  e  with  ne  >  1  and  replace  e  with 
one  or  more  Markovian  events.  A  phase-type  distribution  consists  of  a  set  of  phase  transitions.  Each  phase 
transition  can  be  represented  by  a  Markovian  event.  We  assume  that  the  initial  phase  is  always  phe  =  1,  as 
this  will  simplify  the  handling  of  interacting  events.  A  phase  transition  from  phase  i  to  phase  j  with  rate  Xl] 
is  represented  by  an  event  with  enabling  condition  (t>e  A  phe—i  and  delay  distribution  Exp(Xij).  The  effect 
formula  for  the  phase  transition  event,  ignoring  for  the  moment  possible  event  interactions,  is  phe  <—  j  if 
j  <  nf  and  effe  A  phe  <—  1  otherwise  (a  transition  to  phase  ne  +  1  represents  the  triggering  of  the  original 
event  e  and  resets  the  phase  to  its  initial  value).  We  associate  a  transition  reward  of  zero  with  pure  phase 
transitions  and  ke(s,  s')  with  phase  transitions  representing  the  triggering  of  event  e. 

The  triggering  of  an  event  e  in  state  s  can  cause  another  event  e' ,  enabled  in  s,  to  become  disabled 
in  the  state  following  the  triggering  of  e.  When  e  disables  a  non-Markovian  event  e\  we  should  reset  the 
phase  of  the  phase-type  distribution  for  e!  (i.e.  set  phe<  to  one).  We  can  think  of  the  phases  as  a  partitioning 
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into  random-length  intervals  of  the  time,  ue,  that  an  event  e  has  remained  continuously  enabled  without 
triggering.  Resetting  the  phase  of  an  event  corresponds  to  resetting  ue  to  zero.  To  account  for  this  sort  of 
interaction  between  events,  we  need  to  modify  the  effects  of  events  that  do  not  simply  change  the  value  of 
a  phase  variable.  Let  cj>'e  represent  the  condition  of  an  event  e,  but  evaluated  in  the  next  state  rather  than  the 
current  state.  The  final  effect  formula  for  an  event  e  is  obtained  by  adding  the  effect  (<iy  , )  \>pher  <—  1 
to  effe  for  all  non-Markovian  events  e'  /  e. 

We  now  have  a  method  for  approximating  a  GSMDP  with  a  continuous-time  MDR  If  there  is  a  close 
match  between  the  delay  distributions  of  the  GSMDP  and  the  phase-type  distributions  used  in  the  MDP, 
then  we  expect  the  approximation  to  be  close,  although  we  have  no  quantitative  measure  for  how  good  the 
approximation  is.  Section  9.3  provides  evidence  that  the  approximation  technique  works  well  in  practice. 

9.2.2  Policy  Execution 

The  execution  history  of  a  GSMDP  can  be  represented  by  a  set  of  real-valued  variables,  one  for  each  event 
e  e  E  representing  the  time  e  has  been  continuously  enabled  without  triggering.  The  phases  introduced 
when  approximating  a  GSMDP  with  a  continuous-time  MDP  can  be  thought  of  as  a  randomized  discretiza¬ 
tion  of  the  time  events  have  remained  enabled.  For  example,  approximating  G  with  an  n-phase  Erlang 
distribution  with  parameters  p  and  A  represents  a  discretization  of  the  time  G  has  been  enabled  into  n 
random-length  intervals.  The  length  of  each  interval  is  a  random  variable  with  distribution  Exp( A).  A  pol¬ 
icy  for  the  continuous-time  MDP  with  phase  transitions  is  therefore  approximately  a  mapping  from  states 
and  the  times  events  have  been  enabled  to  actions  for  the  original  GSMDP.  We  can  also  think  of  phase  tran¬ 
sitions  as  a  factored  representation  of  the  distribution  r(o),  which  governs  the  time  to  spend  in  a  state  before 
considering  a  change  of  action  choice. 

Phase  transitions  arc  not  paid  of  the  original  model,  so  we  have  to  simulate  them  when  executing  the 
policy  obtained  for  the  approximate  model.  When  a  GSMDP  event  or  action  e  becomes  enabled  during 
execution,  we  sample  a  first  phase  transition  time  t\  for  the  phase-type  distribution  used  to  approximate 
Ge.  If  e  remains  enabled  for  t\  time  units  without  triggering,  we  increment  the  phase  associated  with  e  and 
sample  a  second  phase  transition  time  t-2-  This  continues  until  e  triggers  or  is  disabled,  in  which  case  the 
phase  is  reset  to  one,  or  we  reach  the  last  phase,  in  which  case  the  phase  does  not  change  until  e  triggers  or 
is  disabled.  The  action  choice  can  change  every  time  a  simulated  phase  transition  occurs,  although  phase 
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transitions  do  not  change  the  actual  state  of  the  process.  This  allows  us  to  take  into  account  the  time  spent 
in  a  state  when  selecting  which  action  to  enable.  We  will  see  in  the  next  section  that  this  can  produce  better 
policies  than  if  actions  arc  chosen  only  at  actual  state  transitions. 

The  phases  can  also  be  thought  of  as  partially  observable  state  variables.  We  cannot  observe  the  phase 
of  a  trigger  time  distribution.  We  can  observe  actual  state  transitions,  however,  so  we  know  how  much  time 
we  have  spent  in  a  state.  At  any  given  time,  we  can  compute  a  probability  distribution  over  phases,  which 
can  be  used  to  select  the  action  to  enable  at  that  time.  This  is  analogous  to  the  QMDP  solution  technique  for 
partially  observable  MDPs  (Littman  et  al.  1995),  and  could  result  in  higher  expected  value  during  execution 
than  if  phase  transitions  arc  simulated.  One  disadvantage,  however,  is  that  time  is  continuous,  which  means 
that  the  belief  distribution  over  phase  assignments  changes  continuously.  In  practice,  we  could  select  a 
frequency  at  which  to  update  the  belief  distribution  and  reconsider  the  current  action  choice,  but  there  is  no 
cleai-  choice  for  such  an  update  frequency.  It  may  be  wasteful  to  update  the  belief  state  with  high  frequency, 
and  we  risk  missing  important  phase  changes  if  the  update  frequency  is  too  low.  Belief  tracking  may  be 
computationally  expensive  as  well.  We  leave  it  to  future  research  to  explore  this,  and  other  alternative  ways, 
of  executing  a  phase-dependent  policy. 

9.3  Experimental  Results 

We  have  implemented  a  basic  GSMDP  planner  based  on  the  solution  procedure  outlined  in  Figure  9.1.  Our 
implementation  uses  MTBDDs  to  represent  matrices  and  vectors,  similar  to  the  approach  proposed  by  Hoey 
et  al.  (1999)  for  discrete-time  MDPs.  MTBDDs  use  Boolean  state  variables,  and  we  need  [log-s]  bits  to 
represent  the  phase  of  a  phase-type  distribution  with  s  phases.  The  experimental  results  were  generated  on 
a  3  GHz  Pentium  4  PC  running  Linux,  and  with  an  800  MB  memory  limit  set  per  process. 

9.3.1  Preventive  Maintenance  (“The  Foreman’s  Dilemma”) 

Our  first  test  case  is  a  variation  of  Howard’s  “the  Foreman's  Dilemma”  (Howard  1960),  where  we  have  a 
machine  that  can  be  working  (so),  failed  (si),  or  serviced  (.S2).  This  example  is  meant  to  show  that  it  can 
be  beneficial  to  delay  the  enabling  of  an  action  in  a  state,  and  phases  allow  us  to  do  so.  A  failure  event 
with  delay  distribution  G  is  enabled  in  sq  and  causes  a  transition  to  ,sq .  Once  in  ,sq,  the  repair  time  for 
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the  machine  has  distribution  Exp(  1/100).  At  any  time  in  .sq.  the  foreman  can  choose  to  enable  a  service 
action  with  delay  distribution  Exp  (10).  If  this  action  triggers  before  the  failure  event,  the  system  enters  s 2 
where  the  machine  is  being  serviced  with  service  time  distribution  Exp ( 1 ) .  Given  reward  rates  c(so)  =  T 
c(si)  =  0,  and  c(s 2)  =  1/2,  independent  of  action  choice,  and  no  transition  rewards,  the  problem  is  to 
produce  a  service  policy  that  maximizes  the  expected  infinite-horizon  discounted  reward  in  sq- 

Depending  on  the  failure  time  distribution  G,  it  may  be  beneficial  to  enable  the  service  action  at  some 
point  in  sq-  This  is  because  it  takes  a  long  time  to  recover  from  failure,  while  the  return  to  sq  from  the 
service  state  is  quick.  Still,  the  reward  rate  is  highest  in  .sp,  so  there  is  an  incentive  to  delay  the  enabling 
of  the  service  action.  The  optimal  policy  in  this  case  is  to  enable  the  service  action  after  spending  to  time 
units  in  so,  where  the  best  choice  for  1 0  depends  on  the  shape  of  G.  The  lower  the  probability  is  that  failure 
occurs  early,  the  later  we  can  schedule  to  enable  the  service  action. 

We  can  model  this  problem  as  an  SMDP,  noting  that  the  probability  of  the  service  action  triggering  before 
the  failure  event  is  P02  =  1  —  j^~  10e_10^_t°^F(f)  dt  (where  F(t)  is  the  cumulative  distribution  function 
for  G)  if  we  enable  the  service  action  after  to  time  units  in  so-  We  can  solve  the  SMDP  using  the  techniques 
described  by  Howard  (1971b),  but  then  we  can  choose  to  enable  the  action  in  so  only  immediately  (to  =  0) 
or  not  at  all  (to  =  00).  Alternatively,  we  can  express  the  expected  reward  in  so  as  a  function  of  to  and  use 
numerical  solution  techniques  to  find  the  value  for  to  that  maximizes  the  expected  reward.  Depending  on 
the  shape  of  F(t),  both  approaches  may  require  numerical  integration  over  semi-infinite  intervals. 

Figure  9.2  plots  the  expected  discounted  reward,  as  a  percentage  of  optimal,  for  policies  obtained  using 
standard  SMDP  solution  techniques  as  well  as  our  technique  for  approximating  a  (G)SMDP  with  an  MDP 
using  phase-type  distributions.  A  uniform  failure  time  distribution  over  the  interval  (5,  b )  was  used.  The 
optimal  value  and  the  value  for  the  SMDP  solution  were  computed  numerically  using  MATLAB,  while  the 
other  values  were  computed  by  simulating  execution  of  the  phase-dependent  policies  and  taking  the  average 
discounted  reward  over  5000  sample  trajectories.  We  used  7  =  0.95  as  the  discount  factor. 

Note  that  the  SMDP  solution  is  well  below  the  optimal  solution  because  it  has  to  enable  the  service 
action  either  immediately,  or  not  at  all,  in  so-  For  small  values  of  b,  the  optimal  SMDP  policy  is  to  enable 
the  action  in  so,  but  as  b  increases  so  does  the  expected  failure  time,  so  for  larger  b  it  is  better  not  to  enable 
the  action  because  it  allows  us  to  spend  more  time  in  so  where  the  reward  rate  is  highest.  The  performance 
of  the  policy  obtained  by  matching  a  single  moment  of  G  is  almost  identical  to  that  of  the  SMDP  solution. 
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Figure  9.2:  Policy  value,  as  a  percentage  of  the  optimal  Figure  9.3:  Number  of  phases  required  to  match  two  mo- 
value,  for  the  Foreman’s  Dilemma  with  the  failure  time  ments  of  a  uniform  distribution  over  the  interval  (5,  b). 
distribution  G  being  U (5,  b).  The  dotted  lines  indicate  4  and  8  phases. 

This  policy  is  also  restricted  to  enabling  the  action  either  immediately  or  nor  at  all  in  sq>  since  there  is 
just  one  phase  in  the  approximation.  Due  to  the  approximation  of  G.  the  performance  is  slightly  worse 
around  the  point  where  the  optimal  SMDP  policy  changes.  We  can  see  that  by  matching  two  moments 
(with  a  generalized  Erlang  distribution),  the  quality  of  the  policy  can  be  increased  significantly.  Note  that 
the  number  of  phases  required  to  match  two  moments  of  U (5,  b)  varies  with  6,  as  is  shown  in  Figure  9.3. 
For  6  =  6,  over  300  phases  arc  needed,  which  helps  to  explain  the  high  quality  at  this  point  for  the  policy 
obtained  by  matching  two  moments.  We  also  show  the  value  for  policies  obtained  by  fixing  the  number 
of  phases  and  using  the  EM  algorithm  to  find  a  phase-type  distribution  with  good  fit.  Note  that  using  8 
phases  instead  of  4  actually  hurts  the  quality  of  the  policy  for  some  values  of  b.  In  these  cases,  the  8-phase 
distribution  causes  the  enabling  of  the  service  action  to  be  delayed  for  too  long. 

Figure  9.4  shows  the  performance  of  policies  for  a  different  failure  time  distribution — a  Weibull  distri¬ 
bution  with  parameters  1.6a  and  4.5.  In  this  case,  a  16-phase  generalized  Erlang  distribution  is  sufficient 
to  match  two  moments  for  all  values  of  a.  We  can  see  the  policy  obtained  by  using  8  phases  and  EM  fit¬ 
ting  actually  outperforms  the  policy  obtained  by  matching  two  moments,  if  only  slightly,  and  we  can  get 
even  better  performance  by  using  24  phases.  For  a  =  10,  the  solution  obtained  with  8  phases  gives  us  a 
34  percent  increase  in  value  compared  to  the  SMDP  solution,  and  the  value  increase  is  50  percent  with  24 
phases.  The  SMDP  and  single  moment  solutions  again  have  almost  identical  performance,  and  arc  for  the 
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Figure  9.4:  Policy  value,  as  a  percentage  of  the  optimal  value,  for  the  Foreman’s  Dilemma  with  the  failure  time 
distribution  G  being  FF(1.6a,  4.5). 

most  part  significantly  worse  than  the  other  solutions.  The  only  exception  is  for  low  values  of  a,  in  which 
case  the  phase-type  distributions  underestimate  the  probability  that  failure  will  occur  at  a  very  early  stage, 
so  the  enabling  of  the  service  action  comes  later  than  needed  to  perform  well.  In  most  situations,  however, 
using  more  phases  gives  better  policies,  mainly  because  the  additional  phases  allow  us  to  better  account  for 
the  fact  that  G  is  not  memoryless.  For  the  Foreman’s  Dilemma,  this  is  crucial  as  it  allows  us  to  delay  the 
enabling  of  the  service  action  in  ,s-q,  taking  into  account  the  fact  that  failure  is  unlikely  to  occur  early  on. 

9.3.2  System  Administration  Problem 

Our  second  test  case  is  a  system  administration  problem,  loosely  based  on  a  similar  problem  described  by 
(Guestrin  et  al.  2003).  While  the  first  test  case  illustrated  that  phases  can  result  in  better  policies  by  delaying 
the  enabling  of  an  action  in  a  state,  this  test  case  illustrates  that  phases  can  help  by  keeping  an  action  enabled 
if  it  has  already  been  enabled  for  some  time.  In  both  cases,  phases  introduce  memory  into  the  state  space. 

In  the  system  administration  problem,  there  is  a  network  of  n  computers,  with  each  computer  being 
either  up  or  down.  There  is  a  crash  event  for  each  computer  that  can  cause  a  computer  that  is  currently 
up  to  go  down  at  a  random  point  in  time.  The  delay  of  the  crash  event  is  governed  by  an  exponential 
distribution  with  unit  rate.  To  make  this  a  decision  problem,  we  add  a  reboot  action  for  each  machine  that 
can  be  enabled  whenever  a  machine  is  down.  The  delay  distribution  for  this  action  is  [7(0, 1).  The  reward 
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Figure  9.5:  Expected  discounted  reward  for  the  system  administration  problem  with  n  machines.  The  expected  reward 
is  reported  for  state  so  with  all  n  machines  up. 

rate  for  a  state  is  equal  to  the  number  of  machines  that  are  up,  so  in  a  state  with  all  machines  up  we  earn 
a  reward  of  n  per  time  unit.  We  assume  that  there  is  a  single  system  administrator  managing  the  network, 
so  only  a  single  reboot  action  can  be  enabled  at  any  point  in  time.  Unlike  the  previous  test  case,  this  is  not 
an  SMDP,  except  for  n  =  1,  because  a  reboot  action  may  remain  enabled  across  state  transitions  (caused 
by  a  crash  event).  We  therefore  cannot  solve  this  problem  using  existing  SMDP  solution  techiques.  The 
obvious  solution  is  to  reboot  a  machine  whenever  it  goes  down,  and  wait  until  rebooting  is  finished  before 
going  on  to  reboot  another  machine.  The  problem  is  that  in  a  Markov  formulation  we  would  not  know  that 
we  have  been  rebooting  a  machine  when  another  machine  goes  down.  The  introduction  of  phases  gives  us 
that  information  and  therefore  enables  us  to  obtain  better  policies. 

Figure  9.5  plots  the  expected  discounted  reward  (7  =  0.95)  of  the  policy  obtained  by  our  GSMDP 
planner  when  approximating  each  uniform  distribution  with  a  phase-type  distribution.  We  report  the  values 
obtained  when  matching  one  and  two  moments  (using  a  three  phase  Erlang  distribution),  and  when  fixing 
the  number  of  phases  per  uniform  distribution  to  2,  4,  and  8.  By  using  the  EM  algorithm  with  at  least  two 
phases,  we  can  increase  the  expected  reward  by  up  to  10  percent  compared  with  the  solution  obtained  by 
matching  only  a  single  moment.  When  matching  a  single  moment,  we  can  enable  a  reboot  action  based  only 
on  which  machines  are  currently  down,  and  the  resulting  policy  reboots  machine  i  before  machine  j  if  i  <  j. 
In  contrast,  the  policy  obtained  when  using  multiple  phases  keeps  a  reboot  action  enabled  if  it  is  in  a  phase 
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Figure  9.6:  Planning  time  for  the  system  administration 
problem. 


Figure  9.7:  Size  of  potentially  reachable  state  space  for 
the  system  administration  problem. 


other  than  one,  because  it  is  expected  to  trigger  soon.  By  using  more  than  two  phases,  we  can  increase  the 
expected  reward  even  further,  although  the  increase  is  not  as  significant. 

By  increasing  the  number  of  phases  used  to  represent  a  non-exponential  distribution,  we  increase  the 
accuracy  of  the  approximation,  but  we  also  increase  the  state  space.  In  terms  of  planning  time,  a  larger  state 
space  means  that  a  solution  will  take  longer  to  obtain.  Thus,  as  one  can  expect,  in  general,  better  policies 
are  obtained  at  the  price  of  longer  solution  times.  The  solution  time  for  the  system  administration  problem, 
not  including  phase-type  fitting1  and  model  construction,  is  shown  in  Figure  9.6.  Figure  9.7  plots  the  size 
of  the  potentially  reachable  state  space  (from  the  state  with  all  machines  up)  as  a  function  of  the  number  of 
machines,  n.  If  we  use  s  phases  to  represent  a  non-exponential  distribution,  then  the  size  of  the  reachable 
state  space  is  at  most  ((s  —  T)n/2  +  1)  •  2n.  Note  that  d  =  ([log  s]  +  l)n  Boolean  state  variables  are  used 
for  a  problem  with  n  machines,  but  the  reachable  state  space  is  significantly  smaller  than  2d  for  s  >  1.  For 
n  =  13  and  s  =  8,  we  have  d  =  52,  while  the  size  of  the  state  space  is  under  4  •  105  (<  219). 


9.3.3  State  Filtering  and  Uniformization 

We  conclude  the  empirical  evaluation  of  our  planning  approach  with  a  discussion  of  techniques  for  reducing 
planning  time.  The  first  technique  is  related  to  the  use  of  Boolean  state  variables  to  encode  the  phase  of  a 
distribution.  If  the  number  of  phases  is  not  a  power  of  2,  then  we  arc  potentially  introducing  spurious  states 

1  The  time  for  phase-type  fitting  ranges  from  a  few  milliseconds  (2  phases)  to  a  few  minutes  (8  phases)  for  the  EM  approach. 
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into  the  model.  This  could  mean  that  we  arc  wasting  time  computing  the  optimal  action  choice  for  irrelevant 
states.  It  is  also  the  case  that  the  phase  associated  with  the  delay  distribution  for  an  action  or  event  is  not 
significant  when  the  action  or  event  is  disabled.  By  convention  we  set  the  phase  to  one  for  disabled  actions 
and  events,  so  a  different  phase  assignment  for  a  diabled  action  or  event  corresponds  to  a  spurious  state. 

Consider  the  recurrence  in  (9.5),  which  we  use  in  our  implementation  of  value  iteration  for  continuous¬ 
time  MDPs.  To  avoid  computing  the  optimal  action  choice  for  evidently  spurious  states,  we  can  apply  a 
filter  to  the  vectors  and/or  the  matrix  involved  in  the  computation.  For  example,  we  can  set  all  row  elements 
of  Fl  y  to  zero  for  spurious  states,  or  we  could  set  to  zero  all  entries  of  iUy,  corresponding  to  such  states. 
Applying  a  filter  to  a  vector  or  matrix  represented  by  an  MTBDD  could  result  in  a  larger  representation, 
which  could  result  in  increased  planning  times.  Figure  9.8  shows  the  effect  of  filtering  for  the  system 
administration  problem,  with  different  choices  of  s  (the  number  of  phases  to  use  for  each  non-exponential 
distribution).  We  can  see  that  filtering  just  the  //  y  vectors  results  in  the  best  performance,  while  using  no 
filter  at  all  leads  to  a  noticeable  performance  degradations  as  n  increases.  Filtering  helps  even  when  s  is  a 
power  of  2,  because  the  phase  is  forced  to  be  one  for  reboot  actions  that  arc  not  enabled. 

We  can  solve  a  continuous-time  MDP  directly,  using  the  recurrence  in  (9.4).  Alternatively,  we  can 
use  uniformization  to  transform  the  continuous-time  MDP  into  a  discrete-time  MDP,  and  solve  the  result¬ 
ing  problem.  Uniformization  is  a  technique  by  which  we  transform  a  continuous-time  MDP  with  state- 
dependent  exit  rates  into  an  equivalent  continuous-time  MDP  with  the  same  (uniform)  exit  rate  for  all  states. 
The  uniform  continuous-time  MDP  can  then  be  treated  as  a  discrete-time  MDP  resulting  from  observing 
the  original  continuous-time  MDP  at  a  constant  rate.  Uniformization  introduces  self-transitions  not  present 
in  the  original  model,  because  it  is  possible  that  we  remain  in  the  same  state  from  one  observation  to  an¬ 
other.  While  uniformization  seems  to  be  promoted  as  the  standard  solution  technique  for  continuous-time 
MDPs  (cf.  Puterman  1994),  it  is  not  clear  what  the  benefit  is  of  using  uniformization  rather  than  solving 
the  continuous-time  MDP  directly.  In  fact,  as  Figure  9.9  indicates,  uniformization  can  actually  hurt  perfor¬ 
mance.  The  introduction  of  virtual  self-transitions  increases  the  complexity  of  the  transition  matrix,  which 
makes  each  iteration  of  value  iteration  take  longer  time. 

In  conclusion,  we  have  shown  that  phase-type  distributions  arc  useful  for  solving  decision  theoretic 
planning  problems  with  asynchronous  events  and  actions.  Using  more  phases  often  results  in  better  policies, 
but  also  increased  planning  times.  State  filtering  can  help  to  reduce  planning  times. 
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Chapter  10 


Conclusion  and  Future  Work 


At  the  outset  of  this  thesis,  we  embarked  on  an  ambitious  endeavor  to  develop  algorithms  for  both  plan¬ 
ning  and  verification  with  asynchronous  events.  We  believe  our  research  effort  to  be  a  good  start  in  the 
direction  towards  practical  solution  techniques  for  asynchronous  stochastic  systems,  but  we  most  certainly 
acknowledge  that  we  have  only  scraped  the  surface  of  this  vast  area  of  research. 

In  verification,  we  have  established  the  foundations  of  statistical  probabilistic  model  checking.  A  key 
observation  is  that  probabilistic  model  checking  can  be  modeled  as  a  hypothesis  testing  problem.  We  can 
therefore  use  well-established  and  efficient  statistical  hypothesis  testing  techniques,  in  particular  sequential 
acceptance  sampling,  for  probabilistic  model  checking.  Our  model  checking  approach  is  not  tied  to  any 
specific  statistical  test.  The  only  requirement  is  that  we  can  bound  the  probability  of  an  incorrect  answer 
(either  a  false  positive  or  a  false  negative).  A  potential  benefit  of  statistical  techniques  is  that  they  tend  to  be 
highly  amenable  to  parallelization.  We  show  this  to  be  the  case  for  statistical  model  checking,  although  some 
care  must  be  take  so  as  not  to  introduce  bias  in  the  sampling  process.  Our  solution  to  this  problem  results 
in  a  distributed  algorithm  for  probabilistic  model  checking  that  can  take  full  advantage  of  a  heterogeneous 
computing  environment  without  the  need  for  any  explicit  communication  of  performance  characteristics. 

We  have  considered  only  transient  properties  of  stochastic  systems.  The  logic  CSL,  as  described  by  Baier 
et  al.  (2003),  can  also  express  steady-state  properties.  Statistical  techniques  for  steady-state  analysis  exist, 
including  batch  means  analysis  and  regenerative  simulation  (Bratley  et  al.  1987).  Although  these  techniques 
have  been  used  for  statistical  estimation ,  we  are  confident  that  they  could  be  adapted  for  hypothesis  testing. 
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as  well.  Extending  our  work  on  statistical  probabilistic  model  checking  to  steady-state  properties  is  therefore 
a  prime  candidate  for  future  work.  To  more  efficiently  handle  probability  thresholds  close  to  zero  and  one, 
the  use  of  importance  sampling  (Heidelberger  1995)  may  also  be  possible.  It  would  moreover  be  worthwhile 
exploring  Bayesian  techniques  for  acceptance  sampling,  in  particular  the  test  developed  by  Lai  (1988).  It 
is  well-known  that  the  sequential  probability  ratio  test,  while  generally  very  efficient,  tends  to  require  a 
large  sample  size  if  the  true  probability  lies  in  the  indifference  region  of  the  test,  which  is  unfortunate 
because  we  spend  the  most  effort  where  we  arc  indifferent  of  the  outcome.  This  shortcoming  is  addressed 
by  Bayesian  hypothesis  testing.  The  challenge  would  be  to  devise  a  Bayesian  test  for  conjunctive  and  nested 
probabilistic  operators.  A  final  topic  for  future  work,  which  we  have  not  discussed  much  in  this  thesis, 
is  to  improve  the  efficiency  of  discrete  event  simulation  for  our  representation  of  stochastic  discrete  event 
systems.  A  bottleneck  in  our  current  implementation  is  the  determination  of  enabled  events  in  a  state.  Our 
solution  is  to  scan  through  the  list  of  all  events  and  evaluate  the  enabling  condition  for  each  event.  This  is 
not  efficient  for  models  with  many  events.  We  think  that  perhaps  the  use  of  symbolic  data  structures,  such 
as  BDDs  and  MTBDDs,  could  speed  up  the  generation  of  sample  trajectories. 

Our  contribution  to  the  artificial  intelligence  community  is  a  formalism  for  planning  with  asynchronous 
events  in  stochastic  environments.  We  base  this  formalism  on  an  established  model  in  queuing  theory,  the 
generalized  semi-Markov  process.  Asynchronous  stochastic  systems  have  been  largely  absent  in  AI  research 
on  planning.  We  hope  that  we  can  inspire  further  research  on  this  topic  with  the  establishment  of  a  formal 
model  for  stochastic  decision  processes  with  asynchronous  events.  We  have  presented  two  approaches  to 
planning  with  asynchronous  events,  both  with  merits  and  limitations. 

For  goal  directed  planning,  we  have  developed  an  approach  based  on  the  Generate,  Test  and  Debug 
paradigm.  Statistical  model  checking  is  used  to  verify  policies,  and  the  simulation  traces  generated  during 
verification  arc  used  to  guide  policy  repair.  We  have  demonstrated  that  this  approach  can  be  used  for  auto¬ 
mated  policy  repair.  However,  there  is  no  guarantee  that  a  repair  step  takes  us  closer  to  a  solution,  and  the 
selection  of  repair  steps  is  hard  to  automate  for  more  complex  bugs.  We  believe  that  the  analysis  techniques 
would  be  more  useful  as  an  aid  to  human  system  analysts  and  engineers.  To  make  this  work,  we  need  to 
develop  tools  for  visualizing  the  information  gathered  from  the  simulation  traces.  The  failure  scenarios  that 
we  extract  could  be  valuable  information  to  a  system  analyst  hying  to  debug  a  faulty  system  design. 

To  solve  decision  theoretic  planning  problems  with  asynchronous  events,  we  have  used  phase-type  dis- 
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tributions.  We  have  experimented  with  different  methods  for  approximating  a  general  distribution  with  a 
phase-type  distribution,  and  we  have  shown  that  the  introduction  of  phases  makes  it  possible  to  generate 
policies  of  higher  quality  than  if  we  simply  assume  that  all  events  have  exponential  delay  distributions.  A 
limitation  of  our  approach  is  that  we  cannot  guarantee  that  the  approximate  solution  is  approximately  opti¬ 
mal,  although  using  more  phases  generally  results  in  better  policies.  It  is  not  even  clear  what  the  shape  of  an 
optimal  policy  for  a  GSMDP  is,  nor  is  it  evident  that  optimal  GSMDP  planning  is  decidable  in  the  general 
case.  We  take  a  pragmatic  approach  by  at  least  generating  a  policy  that  almost  always  is  better  than  the  one 
obtained  by  simply  ignoring  history  dependence.  A  thorough  theoretical  analysis  of  the  GSMDP  formalism 
is  currently  lacking,  and  is  a  clear  candidate  for  future  research.  We  would  also  like  to  explore  alternative 
approximate  solution  techniques  for  GSMDPs,  including  value  function  approximation. 

It  is  clear  that  there  arc  systems  in  the  real  world  for  which  the  Markov  assumption  is  inappropriate. 
This  is,  in  particular,  the  case  for  many  systems  with  asynchronous  events.  We  have  provided  practical 
techniques  for  verification  and  planning  for  such  systems.  We  have  presented  a  statistical  approach  to 
probabilistic  verification,  which  is  applicable  to  any  stochastic  discrete  event  system.  The  user  is  given 
only  probabilistic  correctness  guarantees,  but  the  alternative  is  to  use  an  approximate  model  amenable  to 
numerical  verification  techniques  and  it  is  generally  hard  to  quantify  the  effect  that  a  model  approximation 
has  on  the  validity  of  the  verification  result.  For  planning,  we  have  demonstrated  that  the  use  of  phase-type 
distributions  can  allow  us  to  generate  control  policies  with  greater  expected  value  than  if  we  ignored  history 
dependence.  Models  with  phase  information  arc  more  complex  and  therefore  take  longer  time  to  solve.  In 
many  situations,  however,  we  need  to  generate  a  control  policy  only  once  for  a  system  and  the  same  policy 
can  be  used  repeatedly.  Even  a  small  increase  in  efficiency  of  a  manufacturing  process,  for  instance,  can  lead 
to  a  large  profit  increase  for  a  business.  In  future  research,  we  plan  to  identify  several  real-world  applications 
for  the  techniques  we  have  developed. 


Appendix  A 


Input  Language  for  Model  Checker 


The  experimental  results  presented  in  Chapter  6  were  generated  by  the  probabilistic  model  checker  Ymer.1 
The  input  language  used  by  Ymer  is  based  on  the  PRISM  language  (Parker  2002),  which  takes  inspiration 
from  Alur  and  Henzinger’s  (1999)  Reactive  Modules  formalism. 

A.l  Modular  Specification  of  Stochastic  Discrete  Event  Systems 

The  model  of  a  stochastic  discrete  event  system  is  specified  as  a  set  of  asynchronous  modules.  Figure  A.  1 
shows  a  GSMP  model  of  a  tandem  queuing  network  and  its  representation  in  the  Ymer  input  language.  The 
model  has  two  modules:  serverC  and  serverM.  A  set  of  local  state  variables  SVm  and  a  set  of  events  Em  is 
associated  with  each  module  m.  The  state  variables  sc  and  sm  in  our  simple  example  arc  used  to  record  the 
number  of  items  currently  stored  in  each  of  the  queues.  A  model  can  also  have  a  set  of  global  state  variables 
SVg.  For  the  tandem  queuing  network,  SVg  is  empty.  The  set  of  all  state  variables,  S V  =  SVg  U  (J  SVrn, 
constitutes  a  factored  representation  of  the  state  space  for  the  model. 

Each  event  e  has  an  enabling  condition  <j)e,  which  is  a  logic  formula  over  the  state  variables  SV .  An 
event  e  is  enabled  in  a  state  s  if  and  only  if  s  |=  d)e.  The  enabled  events  in  a  state  race  to  trigger  first. 
The  trigger  time  for  each  event  e  is  determined  by  a  positive  distribution  Ge.  Ymer  currently  supports  the 
exponential,  Weibull,  lognormal,  and  uniform  distributions.  Only  continuous  distributions  are  permitted  in 
order  to  avoid  complications  arising  from  the  simultaneous  triggering  of  multiple  events,  which  could  be  a 
1  Ymer  web  site:  http://www.es. cmu.edu/~lorens/ymer.html 
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serverC 


Exp (A) 


serverM 


W(TI,p) 


Exp{K ) 


gsmp 

const  n  =  63; 

rate  lambda  =  252;  //  4*n 
rate  eta  =  1 ; 
rate  beta  =1/2; 
rate  kappa  =  4; 

module  serverC 
sc  :  [0..n]; 

[]  (sc<n)->  lambda  :  sc’=sc+l; 

[route]  (sc>0)  ->  W(eta,beta) :  sc’=sc— 1; 
endmodule 

module  serverM 
sm :  [0..n]; 

[route]  (sm<n)->  1  :  sm’=sm+l; 

[]  (sm>0)->  kappa  :  sm’=sm— 1; 
endmodule 


Figure  A.l:  A  tandem  queuing  network  (left)  and  its  representation  in  the  input  language  used  by  Ymer  (right). 


source  of  nondeterminism.  The  triggering  event  in  a  state  updates  the  values  of  state  variables  local  to  the 
module  that  the  event  is  associated  with.  An  event  is  also  permitted  to  update  global  state  variables,  but 
cannot  change  the  value  of  state  variables  that  belong  to  a  different  module. 

It  is  possible  to  synchronize  the  update  of  state  variables  from  different  modules.  The  event  with  a 
Weibull  distribution  that  routes  messages  from  serverC  to  serverM  is  an  example  of  this  in  the  specification 
of  Figure  A.l.  There  is  one  event  in  each  module  with  a  synchronization  label  “route”,  and  these  two  events 
arc  paired  into  a  single  event.  The  condition  for  the  composite  event  is  the  conjunction  of  the  individual 
event  conditions,  and  the  update  list  for  the  composite  event  is  the  concatenation  of  the  update  lists  for  the 
individual  events.  All  but  one  of  the  individual  events  must  have  an  exponential  trigger  time  distribution  with 
unit  rate,  specified  as  1.  The  trigger  time  distribution  for  the  composite  event  is  taken  from  the  individual 
event  that  has  a  different  trigger  time  distribution.  In  the  tandem  queuing  network  model,  the  trigger  time 
distribution  for  the  composite  event  is  taken  from  the  event  in  the  serverC  module.  Synchronizing  events  arc 
not  permitted  to  update  the  same  global  variable  in  an  inconsistent  manner,  as  this  would  lead  to  an  under 
specified  model. 


A.2.  BNF  GRAMMAR 
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A.2  BNF  Grammar 


This  section  presents  the  full  syntax  for  Ymer’s  input  language  using  an  extended  BNF  notation  with  the 
following  conventions: 

•  Each  rule  is  of  the  form  ( non-terminal )  ::=  expansion. 

•  Alternative  expansions  arc  separated  by  a  vertical  bar  (“|”). 

•  An  asterisk  (“*”)  following  a  syntactic  element  x  means  zero  or  more  occurrences  of  x. 

•  Terminals  arc  written  using  typewriter  font. 

•  Case  is  significant.  For  example,  X  and  x  arc  separate  identifiers. 

•  Parentheses  and  square  brackets  arc  an  essential  part  of  the  syntax  and  have  no  semantic  meaning  in 
the  extended  BNF  notation. 

•  Any  number  of  whitespace  characters  (space,  newline,  tab,  etc.)  may  occur  between  tokens. 


There  arc  two  top-level  syntactic  elements  that  may  occur  in  an  input  file:  {model)  and  {property ).  A  {name) 
is  a  string  of  characters  starting  with  an  alphabetic  character  followed  by  a  possibly  empty  sequence  of 
alphanumeric  characters,  hyphens  and  underscore  characters  A  ( pname )  is  a  name  immediately 
followed  by  a  prime  symbol  (“'  ”).  An  ( integer )  is  a  non-empty  sequence  of  digits.  A  ( number )  is  a  sequence 
of  numeric  characters,  possibly  with  a  single  decimal  point  (“ .  ”)  at  any  position  in  the  sequence,  or  two 
integers  separated  by  a  slash  A  {probability )  is  a  number  with  a  value  in  the  interval  [0, 1]. 


(i model ) 

{ model-type ) 
{ declaration ) 


{range) 

{ module ) 


::=  ( model-type )  {declaration)*  {module)* 

::=  stochastic  |  ctmc  |  gsmp 
::=  const  {name)  =  { integer )  ; 

|  rate  {name)  =  {number)  ; 

|  global  {name)  :  {range)  ; 

|  global  {name)  :  {range)  init  {expr)  ; 

:  :=  [  {expr)  .  .  {expr)  ] 

::=  module  {name)  {variable-decl)*  {command)*  endmodule 
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{ substitution-list ) 
{ variable-decl ) 

(command) 

(synchronization) 

(formula) 

(binary-comp) 

(distribution) 

(update) 

(expr) 

(binary-op) 

(rate-expr) 

(rate-op) 


module  (name)  =  (name)  [  (substitution-list)  ]  endmodule 
(name)  =  (name)  \  (name)  =  (name)  ,  (substitution-list) 

(name)  :  (range)  ; 

(name)  :  (range)  init  (expr)  ; 

(synchronization)  (formula )  ->  (distribution)  :  (update)  ; 

[  ]  |  [  (name)  ] 

(formula)  &  (formula)  \  formula)  \  formula )  |  !  formula) 

(expr)  (binary-comp)  (expr)  \  (  formula)  ) 

(rate-expr)  \  Exp  (  (rate-expr)  )  |  W  (  (rate-expr)  ,  (rate-expr)  ) 

L  (  (rate-expr)  ,  (rate-expr)  )  |  U  (  (rate-expr)  ,  (rate-expr)  ) 
(pname)  =  (expr)  \  (update)  &  (update)  \  (  (update)  ) 

(integer)  \  (name)  \  (expr)  ( binary-op )  (expr)  \  (  (expr)  ) 

+  |  -  |  * 

(integer)  \  (name)  \  (rate-expr)  (rate-op)  (rate-expr)  \  (  (rate-expr)  ) 

*  I  / 


( property ) 

(pr-comp ) 
(logic-op) 
{path- formula) 


::=  true  |  false  |  P  {pr-comp )  {probability )  [  ( path-formula )  ] 

|  ( property )  (logic-op)  {property )  \  !  {property )  \  (expr)  \  (  (property)  ) 
::=  <  |  <=  |  >=  |  > 

::==>|&|  | 

::=  (property )  U  froperty)  \  X  froperty) 

|  froperty)  U  <=  (number)  froperty) 

|  froperty)  U  [  (number)  ,  (number)  ]  froperty) 

|  X  <=  (number)  froperty)  \  X  [  (number)  ,  (number)  ]  froperty) 


Appendix  B 


PPDDL+:  An  Extension  to  PDDL  for 
Modeling  Stochastic  Decision  Processes 


PDDL  (Ghallab  et  al.  1998;  McDermott  2000;  Fox  and  Long  2003)  is  an  established  formalism  for  ex¬ 
pressing  deterministic  planning  domains  and  problems.  We  present  PPDDL+,  based  on  PDDL  extensions 
proposed  by  Younes  (2003)  and  PPDDL  (Younes  and  Littman  2004).  The  latter  was  developed  for  the  prob¬ 
abilistic  track  of  the  2004  International  Planning  Competition.  PPDDL+  extends  PPDDL  with  facilities  for 
modeling  actions  and  events  with  delayed  effects. 


B.l  Delayed  Actions,  Reward  Rates,  and  UTSL  Goals 

In  PPDDL,  time  is  measured  in  discrete  steps,  with  each  time  step  corresponding  to  the  execution  of  an 
action.  Rewards  arc  associated  with  state  transitions.  This  is  sufficient  for  modeling  discrete-time  MDPs, 
but  not  continuous-time  MDPs  or  GSMDPs.  PPDDL+  introduces  delayed  actions  for  this  purpose. 

A  delayed  action  defines  a  transition  probability  matrix  Pa  and  a  reward  vector  Ra  in  the  same  way  as  a 
regular  PPDDL  action.  Pa  and  Ra  can  be  computed  from  the  effect  formula  for  a  as  described  by  Younes 
and  Littman  (2004).  Pa(i,j)  is  the  probability  of  transitioning  to  state  j  when  a  triggers  in  state  i  and  Ra(i) 
is  the  expected  reward  for  a  state  transition  caused  by  ami.  A  positive  distribution  Ga  is  also  associated 
with  each  delayed  action  a.  Let  Fa(t)  be  the  cumulative  distribution  function  of  Ga.  If  a  becomes  enabled  at 
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time  to  and  remains  continuously  enabled  until  a  triggers,  then  Fa  (t— to)  is  the  probability  that  the  triggering 
of  a  occurs  in  the  interval  (fo,f]-  In  addition  to  delayed  actions,  PPDDL+  supports  delayed  events,  which 
have  the  same  semantics  as  delayed  actions  except  that  they  cannot  be  controlled  by  a  decision  maker. 

With  delayed  actions  and  events,  we  arc  quantitatively  measuring  the  time  that  is  spent  in  a  state  before 
a  state  transition  occurs.  Therefore,  PPDDL+  permits  the  specification  of  state-dependent  reward  rates 
in  problem  definitions.  The  PPDDL+ statement  (:  reward-rate  6  k )  specifies  that  a  reward  of  /;:  is 
awarded  for  every  time  unit  that  is  spent  in  a  state  satisfying  the  formula  d. 

A  final  extension  of  PPDDL  facilitates  the  specification  of  temporally  extended  goals  in  the  form  of 
UTSL  goal  conditions.  The  statement  (:pctl-goal  (pr  0.9  (until  100  $  ^ ))),  for  exam¬ 
ple,  corresponds  to  the  UTSL  formula  'P>  o.9  [4>  U  T] .  We  can  use  this  language  feature  to  express 
the  plan  objective  V>e[0  d],  i.e.  that  d  is  eventually  achieved  with  probability  at  least  9,  commonly  used 
by  probabilistic  planners  (cf.  Farley  1983;  Blythe  1994;  Goldman  and  Boddy  1994b;  Kushmerick  et  al. 
1995;  Lesh  et  al.  1998).  A  regular  PDDL  goal  condition  (  :  goal  d)  corresponds  to  the  UTSL  formula 

P> !  [O  d]  • 


B.2  BNF  Grammar 

We  provide  the  full  syntax  for  PPDDL+  using  an  extended  BNF  notation  with  the  following  conventions: 

•  Each  rule  is  of  the  form  {non-terminal)  ::=  expansion. 

•  Alternative  expansions  are  separated  by  a  vertical  bar  (“|”). 

•  A  syntactic  element  surrounded  by  square  brackets  (“[“  and  “]”)  is  optional. 

•  Expansions  and  optional  syntactic  elements  with  a  superscripted  requirements  flag  arc  available  only 
if  the  requirements  flag  is  specified  for  the  domain  or  problem  currently  being  defined.  For  example, 
|  {types-def)  ] :  typing  in  the  syntax  for  domain  definitions  means  that  ( types-def )  may  occur  only  in 
domain  definitions  that  include  the  :  typing  flag  in  the  requirements  declaration. 

•  An  asterisk  (“*”)  following  a  syntactic  element  x  means  zero  or  more  occurrences  of  x,  a  plus  (“+”) 
following  x  means  at  least  one  occurrence  of  x. 


B.2.  BNF  GRAMMAR 
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•  Parameterized  non-terminals,  for  example  ( typed  list  (jc)),  represent  separate  rules  for  each  instantia¬ 
tion  of  the  parameter. 

•  Terminals  arc  written  using  typewriter  font. 

•  The  syntax  is  Lisp-like.  In  particular  this  means  that  case  is  not  significant  (e.g.  ?x  and  ?X  arc 
equivalent),  parentheses  arc  an  essential  paid  of  the  syntax  and  have  no  semantic  meaning  in  the 
extended  BNF  notation,  and  any  number  of  whitespace  characters  (space,  newline,  tab,  etc.)  may 
occur  between  tokens. 


B.2.1  Domains 


The  syntax  for  domain  definitions  is  the  same  as  for  PDDL2.1,  except  that  durative  actions  have  been 
replaced  by  delayed  actions.  Declarations  of  constants,  predicates,  and  functions  arc  allowed  in  any  order 
with  respect  to  one  another,  but  they  must  all  come  after  any  type  declarations  and  precede  any  action 
declarations.  A  {name)  is  a  string  of  characters  stalling  with  an  alphabetic  character  followed  by  a  possibly 
empty  sequence  of  alphanumeric  characters,  hyphens  (“-”),  and  underscore  characters  (“_”).  A  ( variable ) 
is  a  {name)  immediately  preceded  by  a  question  mark  (“?”).  For  example,  in-office  and  ball_2  arc 
names,  and  ?gripper  is  a  variable. 


{domain) 


(, require-def ) 

{ require-key ) 

{types-def) 

{constants-def) 

{ predicates-def ) 


::=  (  define  (  domain  {name)  ) 

[{require-def)] 

[{types-def  )]:typing 
[( constants-def )] 

[  ( predicates-def )  ] 

[( ' functions -def )] :  f  luents 
{structure-def)  *  ) 

::=  (  :  requirements  {require-key)*  ) 

::=  See  Section  B.2.4 

::=  (  :  types  {typed  list  {name))  ) 

::=  (  :  constants  {typed  list  {name))  ) 

::=  (  : predicates  {atomic formula  skeleton)*  ) 
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(atomic  formula  skeleton ) 
( predicate ) 

( functions-def ) 

( function  skeleton ) 
(function  symbol) 
(structure-def) 

(typed  list  (x)) 

(type) 

( primitive  type ) 

(function  typed  list  (x)) 
(function  type) 


(  ( predicate )  (typed  list  (variable))  ) 

(name) 

(  :  functions  {function  typed  list  (function  skeleton ))  ) 

(  function  symbol)  (typed  list  (variable))  ) 

(name) 

See  Section  B.2.2 

(x)*  |:typing  (jc)+  -  (type)  (typed  list  (x)) 

(  either  {primitive  type)+  )  |  {primitive  type) 

(name) 

(x)*  | :typing  (;c)+  -  function  type)  function  typed  list  (x)) 
number 


B.2.2  Actions 

Action  definitions  and  goal  descriptions  have  the  same  syntax  as  in  PDDL2.1,  with  the  addition  of  delayed 
actions  and  events.  A  (number)  is  a  sequence  of  numeric  characters,  possibly  with  a  single  decimal  point 
(“ .  ”)  at  any  position  in  the  sequence.  Negative  numbers  arc  written  as  (-  (number) )  ,  i.e.  is  using  negation. 

(structure-def)  ::=  (action-def) 

|  :deiayed-actions  Mayed-action-def ) 

|  :  exogenous-events  (, delayed-event-dtf ) 

(action-def)  ::=  (  :  action  (name) 

[:  parameters  (  (typed  list  (variable))  )] 

[: precondition  (GD)] 

[:  effect  (effect)]  ) 

(delayed-action-def)  ::=  (  :  delayed-action  (name) 

[:  parameters  (  (typed  list  (variable))  )] 

:  delay  (delay-distribution) 

[:  condition  (GD)] 

[:  effect  (effect)]  ) 
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(delayed-event-def) 


(GD) 


(atomic  formula  ( x )} 
(term) 

(f-comp ) 

(binary-comp) 

(f-expr) 

( f-head  (x)) 
(binary-op) 


::=  (  :  delayed-event  (name) 

[parameters  (  (typed  list  (variable))  )] 

:  delay  (delay-distribution) 

[:  condition  (GD)] 

[:  effect  (effect)]  ) 

::=  (atomic formula  (term))  |  (  and  (GD)*  ) 

|  : equality  {  =  ) 

|  :  equality  (  not  (  _  (term)  (term)  )  ) 

|  :  negative-preconditions  (  no1_  (atomic  formula  (term))  ) 

|  : dis  junctive-preconditions  ^  not  (GD')  ) 

|  : dis  junctive-preconditions  ^  ) 

|  disjunctive-preconditions  {  imply  (GD)  (GD)  ) 

| :  existential-preconditions  {  exists  (  (typed  list  (variable))  )  (GD)  ) 
|  universal-preconditions  {  forall  (  (typed  list  (variable))  )  (GD)  ) 

|  :  fluents  (J.comp ) 

::=  (  (predicate)  (x)*  )  |  (predicate) 

::=  (name)  \  (variable) 

::=  (  (binary-comp)  (f-expr )  (f-expr)  ) 

::=  (number)  \  (f-head  (term)) 

(  (binary-op)  (f-expr)  (f-expr)  )  |  (  -  (f-expr)  ) 

::=  (  (function  symbol)  (x)*  )  \  (function  symbol) 


The  syntax  for  effects  has  been  extended  to  allow  for  probabilistic  effects,  which  can  be  arbitrarily 
interleaved  with  conditional  effects  and  universal  quantification.  A  (probability )  is  a  (number)  with  a  value 
in  the  interval  [0, 1].  Reward  updates  are  limited  to  constant  increments  and  decrements. 

(effect)  ::=  (p-effect)  \  (and  (effect)*  ) 

I : conditional-effects  {  forall  {  (typed  list  (variable))  )  (effect)  ) 
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(p- effect ) 


{ prob-effect ) 

{ assign-op ) 

(t additive-op ) 
{reward  fluent) 


|  :  conditional-effects  {  when  {effect)  ) 

pprobabiiistic-effects  (  probabilistic  (prob-ejfect)+  ) 
::=  {atomic  formula  (term))  |  (  not  {atomic  formula  (term))  ) 

| :  fluents  ^  {assign-op)  { f-head  (term))  (f-expr)  ) 

| :  rewards  ^  {additive-op)  {reward fluent)  {f-expr )  ) 

::=  {probability)  {effect) 

::=  assign  |  scale-up  |  scale-down  |  ( additive-op ) 

::=  increase  |  decrease 
::=  (  reward  )  |  reward 


Five  families  of  parametric  distributions  are  supported  by  PPDDL+.  A  delay  distribution  that  is  simply 
a  constant  expression  corresponds  to  a  deterministic  distribution.  Implementations  may  not  support  all  the 
distributions,  and  should  report  an  error  if  they  encounter  an  unsupported  distribution  in  a  domain  definition. 
For  example,  a  planning  system  for  continuous-time  MDPs  would  support  only  the  one-parameter  exponen¬ 
tial  distribution.  Furthermore,  support  for  deterministic  distributions  should  be  implemented  with  care.  If 
two  events  with  deterministic  delay  can  be  enabled  simultaneously,  there  could  be  a  non-zero  probability 
that  both  events  trigger  at  the  same  time. 

{delay-distribution)  ::=  {const-expr) 

(  exponential  {const-expr)  [{const-expr)]  ) 

(  weibull  {const-expr)  [{const-expr)]  [{const-expr)]  ) 

(  lognormal  {const-expr)  {const-expr)  ) 

(  uniform  {const-expr)  { const-expr )  ) 

{const-expr)  ::=  {number) 

(  {binary-op)  {const-expr)  {const-expr)  )  |  (  -  {const-expr)  ) 


B.2.3  Problems 

The  syntax  for  problem  definitions  includes  the  extensions  of  PPDDL  to  PDDL2.1  that  allow  for  the  speci¬ 
fication  of  a  probability  distribution  over  initial  states,  and  also  permit  the  association  of  a  one-time  reward 
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with  entering  a  goal  state. 
( problem ) 


(objects-def) 

( init ) 

( init-el ) 

( p-init-el ) 

( prob-init-el ) 
(a-init-el) 

( reward-rate-spec ) 

(state-reward) 

(goal) 

(goal-spec) 

(metric-spec) 

(optimization) 

(ground-f-expr) 


( pctl-formula ) 


In  PPDDL+,  it  is  also  possible  to  specify  a  goal  condition  as  a  UTSL  formula. 

=  (  define  (  problem  (name)  ) 

(  :  domain  (name)  ) 

[(require-def)\ 

[(objects-def)] 

[(init)] 

[(reward-rate- spec)] :  rewards 
(goal)  ) 

■  (  :  ob  jects  (typed  list  [name))  ) 

■  (  :  init  (init-el)*  ) 

-  ( p-init-el )  |  mrobabiiistic-effects  (  probabilistic  {prob-init-el)*  ) 

■  (atomic formula  (name))  |:fluents  (  =  (f-head '{name))  (number)  ) 

=  { probability )  (a-init-el) 

■  { p-init-el )  \  (  and  {p-init-el)*  ) 

=  (  :  reward-rate  (state-reward)*  ) 

=  (GD)  (const-expr) 

■  (goal-spec)  [(metric-spec)]  \  (metric-spec) 

■  (  :  goal  (GD)  )  [(  :  goal-reward  (ground-f-expr)  )]:r®wards 
| : utsi-goais  ^  :  utsl-goal  {pctl-formula)  ) 

-  (  :  metric  (optimization)  (ground-f-expr)  ) 

:  minimize  |  maximize 

=  (number)  \  (f-head  {name)) 

(  (binary-op)  (ground-f-expr)  (ground-f-expr)  ) 

(  -  (ground-f-expr)  ) 

( total-time )  |  total-time 
( goal-achieved )  |  goal-achieved 

:  rewards  (rewarcl fluenf) 

■  (  pr  (probability)  {path-formula)  ) 

|  (  not  (  pr  {probability )  {path-formula)  )  ) 
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( path-formula )  ::=  (  until  \(number)\  [( number)\  ( GD )  (GD)  ) 

|  (  weak-until  [(number)]  [(number)]  (GD)  (GD)  ) 

|  (  eventually  [(number)]  [(number)]  (GD)  (GD)  ) 

|  (  continuously  [(number)]  [(number)]  (GD)  (GD)  ) 


B.2.4  Requirements 

Below  is  a  table  of  all  requirements  in  PPDDL+.  Some  requirements  imply  others;  some  are  abbrevia¬ 
tions  for  common  sets  of  requirements.  If  a  domain  stipulates  no  requirements,  it  is  assumed  to  declare  a 
requirement  for  :  strips. 


Requirement 

: strips 
: typing 
: equality 

: negative-preconditions 
: dis  junctive -preconditions 
: existential -preconditions 
: universal -preconditions 
: quantif ied-preconditions 

: conditional-effects 
: probabilistic-effects 
: rewards 

: fluents 
: utsl-goals 
: delayed-actions 
: exogenous-events 
:  adl 


Description 

Basic  STRIPS-style  adds  and  deletes 
Allow  type  names  in  declarations  of  variables 
Support  =  as  built-in  predicate 
Allow  negated  atoms  in  goal  descriptions 
Allow  disjunctive  goal  descriptions 
Allow  exists  in  goal  descriptions 
Allow  fora  11  in  goal  descriptions 
=  : existential-preconditions 
+  : universal-preconditions 
Allow  when  and  for  all  in  action  effects 
Allow  probabilistic  in  action  effects 
Allow  reward  fluent  in  action  effects  and 
optimization  metric 
Allow  numeric  state  variables 
Allow  UTSL  goal  conditions 
Allow  actions  with  random  delay 
Allow  uncontrollable  events  with  random  delay 
=  : strips +  : typing +  : equality 
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+  : negative-preconditions 
+  : dis junctive-preconditions 
+  : quantif ied-preconditions 
+  : conditional-effects 

:mdp  =  :probabilistic-ef fects +  : rewards 

:gsmdp  =  :mdp+  :  delayed-actions 

+  : exogenous-events 
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